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Field of the Invention 

The invention is in the field of the detection and treatment of genetic diseases. Specifically, the invention 
is directed to the huntingtin gene (also called the IT15 gene), huntingtin protein encoded by such gene, and 
5 the use of this gene and protein in assays (1) for the detection of a predisposition to develop Huntington's dis- 
ease, (2) for the diagnosis of Huntington's disease (3) for the treatment of Huntington's disease, and (4) for 
monitoring the course of treatment of such treatment. 

Background of the Invention 

10 

Huntington's disease (HD) is a progressive neurodegenerative disorder characterized by motor distur- 
bance, cognitive loss and psychiatric manifestations (Martin and Gusella, N. Engl. J. Med. 3f5:1267-1276 
(1986). It is inherited in an autosomal dominant fashion, and affects about 1/10,000 individuals in most popu- 
lations of European origin (Harper, P.S. et al., in Huntington's disease, W.B. Saunders, Philadelphia, 1991). 

15 The hallmark of HD is a distinctive choreic movement disorder that typically has a subtle, insidious onset in 
the fourth to fifth decade of life and gradually worsens over a course of 1 0 to 20 years until death. Occasionally, 
HD is expressed in juveniles typically manifesting with more severe symptoms including rigidity and a more 
rapid course. Juvenile onset of HD is associated with a preponderance of paternal transmission of the disease 
allele. The neuropathology of HD also displays a distinctive pattern, with selective loss of neurons that is most 

20 severe in the caudate and putamen regions of the brain. The biochemical basis for neuronal death in HD has 
not yet been explained, and there is consequently no treatment effective in delaying or preventing the onset 
and progression of this devastating disorder. 

The genetic defect causing HD was assigned to chromosome 4 in 1983 in one of the first successes of 
linkage analysis using polymorphic DNA markers in man (Gusella et al., Nature 306:234-238 (1 983). Since that 

25 time, we have pursued a location cloning approach to isolating and characterizing the HD gene based on pro- 
gressively refining its localization (Gusella, FASEBJ. 3:2036-2041 (1989); Gusella, Adv. Hum. Genet 20:125- 
151 (1991)). Among other work, this has involved the generation of new genetic markers in the region by a 
number of techniques (Pohl era/., Nucleic Acids Res. 76:91 85-91 98 (1988); Whaley et al., Somat. Cell. Mol. 
Genet. 1 7:83-91 (1991); MacDonald et at, J. Clin. Inv. 84:1013-1016 (1989)), the establishment of genetic 

30 (MacDonald et al., Neuron 3:183-190(1989); Allitto et al, Genomics 9:104-112 (1991 )) and physical maps of 
the implicated regions (Bucan et al., Genomics 6:1-15 (1990); Bates et al., Nature Genet. 7:180-187 (1992); 
Doucette-Stamm era/., Somat Cell Mol. Genet 1 7:471-480 (1991); Altherr er al., Genomics f3:1040-1046 
(1 992)), the cloning of the 4p telomere of an HD chromosome in a YAC clone (Bates et al., Am. J. Hum. Genet 
46:762-775 (1990); Youngman etal., Genomics 74:350-356 (1992)), the establishment of YAC [yeast artificial 

35 chromosome] (Bates etal., Nature Genet. 7:180-1 87 (1992)) and cosmid (Baxendale et al., in preparation) con- 
tigs (a series of overlapping clones which together form a whole sequence) of the candidate region, as well 
as the analysis and characterization of a number of candidate genes from the region (Thompson et al., Gen- 
omics 77:1133-1142 (1991); Taylor etal., Nature Genet. 2:223-227 (1992); Ambrose era/., Hum. Mol. Genet. 
7:697-703 (1992)). Analysis of recombination events in HD kindreds has identified a candidate region of 2.2 

40 Mb, between D4S10 and D4S98 in 4p16.3, as the most likely position of the HD gene (MacDonald etal., Neuron 
3:183-190 (1989); Bates et al., Am. J. Hum. Genet 49:7-16 (1991); Snell er al., Am. J. Hum. Genet 57:357- 
362 (1992)). Investigations of linkage disequilibrium between HD and DNA markers in 4p16.3 (Snell etal., J. 
Med. Genet. 26:673-675 (1989); Theilman etal., J. Med. Genet. 26:676-681 (1989)) have suggested that mul- 
tiple mutations have occurred to cause the disorder (MacDonald et al, Am. J. Hum. Genet 49:723-734 (1 991 )). 

45 However, hapiotype analysis using multi-allele markers has indicated that at least 1/3 of HD chromosomes are 
ancestrally related (MacDonald et al, Nature Genet 7: 99-103 (1992)). The hapiotype shared by these HD 
chromosomes points to a 500 kb segment between D4S180 and D4S182 as the most likely site of the ge- 
netic defect. 

Targeting this 500 kb region for saturation with gene transcripts, exon amplification has been used as a 
so rapid method for obtaining candidate coding sequences (Buckler et al, Proc. Natl Acad. Sci. USA 88:4005- 
4009 (1991)). This strategy has previously identified three genes: the a-adducin gene (ADDA) (Taylor er al, 
Nature Genet 2:223-227 (1992)); a putative novel transporter gene (IT10C3) in the distal portion of this seg- 
ment; and a novel G protein-coupled receptor kinase gene (IT11 ) in the central portion (Ambrose et al., Hum. 
Mol. Genet 7:697-703 (1992)). However, no defects implicating any of these genes as the HD locus have been 
55 found. 
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Summary of the Invention 

A large gene, termed herein "huntingtin" or M IT15," has been identified that spans about 210 kb and en- 
codes a previously un described protein of about 348 kDa. The huntingtin reading frame contains a polymorphic 
5 (CAG) n trinucleotide repeat with at least 17 alleles in the normal population, varying from 1 1 to about 34 CAG 
copies. On HD chromosomes, the length of the trinucleotide repeat is substantially increased, for example, 
about 37 to at least 73 copies, and shows an apparent correlation with age of onset, the longest segments are 
detected in juvenile HD cases. The instability in length of the repeat is reminiscent of similar trinucleotide re- 
peats in the fragile X syndrome and in myotonic dystrophy (Suthers et aL, J. Med. Genet. 29:761-765 (1 992)). 
10 The presence of an unstable, expandable trinucleotide repeat on HD chromosomes in the region of strongest 
linkage disequilibrium with the disorder suggests that this alteration underlies the dominant phenotype of HD, 
and that huntingtin encodes the HD gene. 

The invention is directed to the protein huntingtin, DNA and RNA encoding this protein, and uses thereof. 

Accordingly, in a first embodiment, the invention is directed to purified preparations of the protein hun- 
ts tingtin, preferably substantially cell-free. 

In a further embodiment, the invention is directed to a recombinant construct containing DNA or RNA en- 
coding huntingtin. 

In a further embodiment, the invention is directed to a vector containing such huntingtin-encoding nucleic 
acid. 

20 In a further embodiment, the invention is directed to a host transformed with such vector. 

In a further embodiment, the invention is directed to a method for producing huntingtin from such recom- 
binant host. 

In a further embodiment, the invention is direct to a method for diagnosing Huntington's disease using such 
huntingtin DNA, RNA and/or protein. 
25 In a further embodiment, the invention is directed to a method fortreating Huntington's disease using such 

huntingtin DNA, RNA and/or protein. 

In a further embodiment, the invention is directed to a method of gene therapy of a symptomatic or pre- 
symptomatic patient, such method comprising providing a functional huntingtin gene with a (CAG) n repeat of 
the normal range of 11-34 copies to the desired cells of such patient in need of such treatment, in a manner 
30 that permits the expression of the huntingtin protein provided by such gene, for a time and in a quantity suf- 
ficient to provide the huntingtin function to the cells of such patient. 

In a further embodiment, the invention is directed to a method of gene therapy of a symptomatic or pre- 
symptomatic patient, such method comprising providing a functional huntingtin antisense gene to the desired 
cells of such patient in need of such treatment, in a manner that permits the expression of huntingtin antisense 
35 RNA provided by such gene, for a time and in a quantity sufficient to inhibit huntingtin mRNA expression in 
the cells of such patient. 

In a further embodiment, the invention is directed to a method of gene therapy of a symptomatic or pre- 
symptomatic patient, such method comprising providing a functional huntingtin gene to the cells of such patient 
in need of such gene; in one embodiment the functional huntingtin gene contains a (CAG) n repeat size between 
40 11-34 copies. 

In a further embodiment, the invention is directed to a method for diagnosing Huntington's disease or a 
predisposition to develop Huntington's disease in a patient, such method comprising determining the number 
of (CAG) n repeats present in the huntingtin gene in such patient and especially in the affected tissue of such 
patient. 

45 In a further embodiment, the invention is directed to a method fortreating Huntington's disease in a patient, 

such method comprising decreasing the number of huntingtin (CAG) n repeats in the huntingtin gene in the de- 
sired cells of such patient. 

Brief Description of the Drawings 

50 

FIGURE 1. Long-range restriction map of the HD candidate region. A partial long range restriction map of 
4p16.3 is shown (adapted from Lin et aL, Somat. Ceil Mot. Genet 1 7:481-488 (1991)). The HD candidate region 
determined by recombination events is depicted as a hatched line between D4S10 and D4S98. The portion 
of the HD candidate region implicated as the site of the defect by linkage disequilibrium haplotype analysis 
55 (MacDonald et aL, Nature Genet. 1 :99-1 03 (1992) is shown as a filled box. Below the map schematic, the region 
from D4S180 to D4S182 is expanded to show the cosmid contig (averaging 40 kb/cosmid). The genomic cov- 
erage and where known the transcriptional orientation (arrow 5' to 3') of the huntingtin (IT15), IT11, IT10C3 
and ADDA genes is also shown. Locus names above the map denote selected polymorphic markers that have 
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been used in HD families. The positions of D4S127 and D4S95 which form the core of haplotype in the region 
of maximum disequilibrium are also shown in the cosmid contig. Restriction sites are given for Not I (N), Mlu 
I (M) and Nru I (R). Sites displaying complete digestion are shown in boldface while sites subject to frequent 
incomplete digestion are shown as lighter symbols. Brackets around the "N" symbols indicate the presence of 

5 additional clustered Not I sites. 

FIGURE 2. Northern blot analysis of the huntingtin (IT15) transcript Results of the hybridization of IT15A 
to a Northern blot of RNAfrom normal (lane 1) and HD homozygous (lane 2 and 3) lymphoblasts are shown. 
A single RNAof about 11 kb was detected in all three samples, with slight apparent variations being due to 
unequal RNA concentrations. The HD homozygotes are independent, deriving from the large an American fam- 

10 ily (lane 2) and the large Venezuelan family (lane 3), respectively. The Venezuelan HD chromosome has a 
4p16.3 haplotype of "5 2 2" defined by a (GT) n polymorphism at D4S127ar\6 VNTR and Taql RFLPs at D4S95. 
The American homozygote carries the most common 4p16.3 haplotype found on HD chromosomes: "2 11 1" 
(MacDonald et at., Nature Genet 199-1 03 (1 992)). 

FIGURE 3. Schematic of cDNA clones defining the IT15 transcript. Five cDNAs are represented under a 

15 schematic of the composite IT15 sequence. The thin line corresponds to untranslated regions. The thick line 
corresponds to coding sequence, assuming initiation of translation at the first Met codon in the open reading 
frame. Stars mark the positions of the following exon clones 5' to 3': DL83D3-8, DL83D3-1, DL228B6-3, 
DL228B6-5, DL228B6-13, DL69F7-3, DL178H4-6, DL118F5-U and DL134B9-U4. 

The composite sequence was derived as follows. From 22 bases 3' to the putative initiator Met ATG, the 

20 sequence was compiled from the cDNA clones and exons shown. There are 9 bases of sequence intervening 
between the 3' end of IT16B and the 5' end of IT15B. These were by PCR amplification of first strand cDNA 
and sequencing of the PCR product. At the 5* end of the composite sequence, the cDNA clone IT16C terminates 
27 bases upstream of the (CAG) n . However, when IT16C was identified, we had already generated genomic 
sequence surrounding the (CAG) n in an attempt to generate new polymorphisms. This sequence matched the 

25 IT16C sequence, and extended it 337 bases upstream, including the apparent Met initiation codon. 

FIGURE 4. Composite sequence of huntingtin (IT15)(SEQ ID NO:5 and SEQ ID NO:6). The composite DNA 
sequence of huntingtin (IT15) is shown (SEQ ID NO:5). The predicted protein product (SEQ ID NO:6) is shown 
below the DNA sequence, based on the assumption that translation begins at the first in-frame methionine of 
the long open reading frame. 

30 FIGURE 5. DNAsequence analysis of the (CAG) n repeat. DNA sequence shown in panels 1, 2 and 3, dem- 

onstrates the variation in the (CAG) n repeat detected in normal cosmid L191F1 (1), cDNA IT16C (2), and HD 
cosmid GUS72-21 30. Panels 1 and 3 were generated by direct sequencing of cosmid subclones using the fol- 
lowing primer (SEQ ID NO:1): 

35 5' GGC GGG AGA CCG CCA TGG CG 3'. 

Panel 2 was generated using the pBSKII T7 primer (SEQ ID NO:2): 

5' AAT ACG ACT CAC TAT AG 3\ 

40 

FIGURE 6. PCR analysis of the (CAG) n repeat in a Venezuelan HD sibship with some offspring displaying 
juvenile onset. Results of PCR analysis of a sibship in the Venezuela HD pedigree are shown. Affected indi- 
viduals are represented by shaded symbols. Progeny are shown as triangles for confidentiality. AN1 , AN 2 and 
AN3 mark the positions of the allelic products from normal chromosomes. AE marks the range of PCR products 
45 from the HD chromosome. The intensity of background constant bands, which represent a useful reference 
for comparison of the above PCR products, varies with slight differences in PCR conditions. The PCR products 
from cosmids L191F1 and GUS72-2130 are loaded in lanes 12 and 13 and have 18 and 48 CAG repeats, re- 
spectively. 

FIGURE 7. PCR analysis of the (CAG) n repeat in a Venezuelan HD sibship with offspring homozygous for 
so the same HD haplotype. Results of PCR analysis of a sibship from the Venezuela HD pedigree in which both 
parents are affected by HD are shown. Progeny are shown as triangles for confidentiality and no HD diagnostic 
information is given to preserve the blind status of investigators in the Venezuelan Collaborative Group. AN1 
and AN2 mark the positions of the allelic products from normal parental chromosomes. AE marks the range 
of PCR products from the HD chromosome. The PCR products from cosmids L191F1 and GUS72-2130 are 
55 loaded in lanes 29 and 30 and have 18 and 48 CAG repeats, respectively. 

FIGURE 8. PCR analysis of the (CAG) n repeat in members of an American family with an individual hom- 
ozygous for the major HD haplotype. Results of PCR analysis of members of an American family segregating 
the major HD haplotype. AN marks the range of normal alleles; AE marks the range of HD alleles. Lanes 1 , 3, 
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4, 5, 7 and 8 represent PCR products from related HD heterozygotes. Lane 2 contains the PCR products from 
a member of the family homozygous for the same HD chromosome. Lane 6 contains PCR products from a 
normal individual. Pedigree relationships and affected status are not presented to preserve confidentiality. The 
PCR products from cosmids L191F1 and GUS72-2130 (which was derived from the individual represented in 
5 lane 2) are loaded in lanes 9 and 10 and have 18 and 48 CAG repeats, respectively. 

FIGURES 9 and 1 0. PCR analysis of the (CAG) n repeat in two fami lies with supposed new mutation causing 
HD. Results of PCR analysis of two families in which sporadic HD cases representing putative new mutants 
are shown. Individuals in each pedigree are numbered by generation (Roman numerals) and order in the pedi- 
gree. Triangles are used to protect confidentiality. Filled symbols indicate symptomatic individuals. The dif- 
10 ferent chromosomes segregating in the pedigree have been distinguished by extensive typing with polymorphic 
markers in 4p16.3 and have been assigned arbitrary numbers shown above the gel lanes. The starred chro- 
mosomes (3 in Figure 9, 1 in Figure 10) represent the presumed HD chromosome. AN denotes the range of 
normal alleles; AE denotes the range of alleles present in affected individuals and in their unaffected relatives 
bearing the same chromosomes. 
15 FIGURE 11. Comparison of (CAG) n Repeat Unit Number on Control and HD Chromosomes. Frequency 

distributions are shown for the number of (CAG) n repeat units observed on 425 HD chromosomes from 150 
independent families, and from 545 control chromosomes. 

FIGURE 12. Comparison of (CAG)n Repeat Unit Number on Maternally and Paternally Transmitted HD 
Chromosomes. Frequency distributions are shown for the 134 and 161 HD chromosomes from Figure 11 known 
20 to have been transmitted from the mother (Panel A) and father (Panel B), respectively. The two distributions 
differ significantly based on a t-test (t 2 72.3 =5.34, p<0.0001). 

FIGURE 13. Comparison of (CAG) n Repeat Unit Number on HD Chromosomes from Three Large Families 
with Different HD Founders. Frequency distributions are shown for 75, 25 and 35 HD chromosomes from the 
Venezuelan HD family (Panel A) (Gusella, J.F., et a/., Nature 306:234- 238 (1983); Wexler, N.S., era/., Nature 
25 326:194-197 (1987)), Family Z (Panel B) and Family D (Panel C) (Folstein, S.E., et a!., Science 229:776-779 
(1985)), respectively. The Venezuelan distribution did not differ from the overall HD chromosome distribution 
in Figure 11 (t 79 .7= 1.58, p<0.12). Both Family 2 and Family D did produce distributions significantly different 
from the overall HD distribution (U2.2 = 6.73, p<0.0001 and t458=2.90, p<0.004, respectively). 

Figure 14. Relationship of (CAG) n Repeat Length in Parents and Corresponding Progeny. Repeat length 
30 on the HD chromosome in mothers (Panel A) or fathers (Panel B) is plotted against the repeat length in the 
corresponding offspring. A total of 25 maternal transmissions and 37 paternal transmissions were available 
for typing. 

FIGURE 15. Amplification of the HD (CAG) n Repeat From Sperm and Lymphoblast DNA. DNAfrom sperm 
(S) and lymphoblasts (L) for 5 members (pairs 1-5) of the Venezuelan HD pedigree aged 24-30 were used for 
35 PCR amplification of the HD (CAG) n repeat. The lower band in each lane derives from the normal chromosome. 

FIGURE 16. Relationship of Repeat Unit Length with Age of Onset. Age of onset was established for 234 
diagnosed HD gene carriers and plotted against the repeat length observed on both the HD and normal chro- 
mosomes in the corresponding lymphoblast lines. 

40 Detailed Description of the Invention 

In the following description, reference will be made to various methodologies known to those of skill in the 
art of molecular genetics and biology. Publications and other materials setting forth such known methodologies 
to which reference is made are incorporated herein by reference in their entireties as though set forth in full. 

45 The IT1 5 gene described herein is a gene from the proximal portion of the 500 kb segment between human 

chromosome 4 markers D4S180 and D4S182. The huntingtin gene spans about 210 kb of DNA and encodes 
a previously undescribed protein of about 348 kDa. The huntingtin reading frame contains a polymorphic 
(CAG) n trinucleotide repeat with at least 17 alleles in the normal human population, where the repeat number 
varies from 11 to about 34 CAG copies in such alleles. This is the gene of the human chromosome that, as 

50 shown herein, suffers the presence of an unstable, expanded number of CAG trinucleotide repeats in Hun- 
tington's disease patients, such that the number of CAG repeats in the huntingtin gene increases to a range 
of 37 to at least 86 copies. These results are the basis of a conclusion that the huntingtin gene encodes a protein 
called "huntingtin," and that in such huntingtin gene the increase in the number of CAG repeats to a range of 
greater than about 37 repeats is the alteration that underlies the dominant phenotype of Huntington's disease. 

55 As used herein huntingtin gene is also called the Huntington's disease gene. 

It is to be understood that the description below is applicable to any gene in which a CAG repeat within 
the gene is amplified in an aberrant manner resulting in a change in the regulation, localization, stability or 
translatability of the mRNA containing such amplified CAG repeat that is transcribed from such gene. 
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/. Cloning Of Huntingtin DNA And Expression Of Huntingtin Protein 

The Identification of huntingtin DNA and protein as the altered gene in Huntington's disease patients is 
exemplified below. In addition to utilizing the exemplified methods and results for the identification of deletions 

5 of the huntingtin gene in Huntington's disease patients, and for the isolation of the native human huntingtin 
gene, the sequence information presented in Figure 4 represents a nucleic acid and protein sequence, that, 
when inserted into a linear or circular recombinant nucleic acid construct such as a vector, and used to trans- 
form a host cell, will provide copies of huntingtin DNA and huntingtin protein that are useful sources for the 
native huntingtin DNA and huntingtin protein for the methods of the invention. Such methods are known in the 

10 art and are briefly outlined below. 

The process for genetically engineering the huntingtin coding sequence, for expression under a desired 
promoter, is facilitated through the cloning of genetic sequences which are capable of encoding such huntingtin 
protein. Such cloning technologies can utilize techniques known in the art for construction of a DNA sequence 
encoding the huntingtin protein, such as, for example, polymerase chain reaction technologies utilizing the hun- 

15 tingtin sequence disclosed herein to isolate the huntingtin gene anew, or an allele thereof that varies in the 
number of CAG repeats in such gene, or polynucleotide synthesis methods for constructing the nucleotide se- 
quence using chemical methods. Expression of the cloned huntingtin DNA provides huntingtin protein. 

As used herein, the term "genetic sequences" is intended to refer to a nucleic acid molecule of DNA or 
RNA, preferably DNA. Genetic sequences that are capable of being operably linked to DNA encoding huntingtin 

20 protein, so as to provide for its expression and maintenance in a host cell are obtained from a variety of sources, 
including commercial sources, genomic DNA, cDNA, synthetic DNA, and combinations thereof. Since the ge- 
netic code is universal, it is to be expected that any DNA encoding the huntingtin amino acid sequence of the 
invention will be useful to express huntingtin protein in any host, including prokaryotic (bacterial) hosts, eu- 
karyotic hosts (plants, mammals (especially human), insects, yeast, and especially any cultured cell popula- 

25 tions). 

If it is desired to select anew a gene encoding huntingtin from a library that is thought to contain a huntingtin 
gene, such library can be screened and the desired gene sequence identified by any means which specifically 
selects for a sequence coding for the huntingtin gene or expressed huntingtin protein such as, for example, 
a) by hybridization (under stringent conditions for DNA: DNA hybridization) with an appropriate huntingtin DNA 

30 probe(s) containing a sequence specific for the DNA of this protein, such sequence being that provided in Fig- 
ure 4 or a functional derivative thereof that is, a shortened form that is of sufficient length to identify a clone 
containing the huntingtin gene, or b) by hybridization-selected translational analysis in which native huntingtin 
mRNA which hybridizes to the clone in question is translated in vitro and the translation products are further 
characterized for the presence of a biological activity of huntingtin, ore) by immunoprecipitation of a translated 

35 huntingtin protein product from the host expressing the huntingtin protein. 

When a human allele does not encode the identical sequence to that of Figure 4, it can be isolated and 
identified as being huntingtin DNA using the same techniques used herein, and especially PCR techniques to 
amplify the appropriate gene with primers based on the sequences disclosed herein. Many polymorphic probes 
useful in the fine localization of genes on chromosome 4 are known and available (see, for example, 

40 "ATCC/NIH Repository Catalogue of Human and Mouse DNA Probes and Libraries," fifth edition, 1991, pages 
4-6. For example, a useful D4S10 probe is clone designation pTV20 (ATCC 57605 and 57604); H5.52 (ATCC 
61107 and 61106) and F5.53 (ATCC 61108). 

Human chromosome 4-specif ic libraries are known in the art and available from the ATCC for the isolation 
of probes ("ATCC/NIH Repository Catalogue of Human and Mouse DNA Probes and Libraries," fifth edition, 

45 1991, pages 72-73), for example, LL04NS01 and LL04NS02 (ATCC 57719 and ATCC57718) are useful for 
these purposes. 

It is not necessary to utilize the exact vector constructs exemplified in the invention; equivalent vectors 
can be constructed using techniques known in the art For example, the sequence of the huntingtin DNA is 
provided herein, (see Figure 4) and this sequence provides the specificity for the huntingtin gene; it is only 

so necessary that a desired probe contain this sequence, or a portion thereof sufficient to provide a positive in- 
dication of the presence of the huntingtin gene. 

Huntingtin genomic DNA may or may not include naturally occurring introns. Moreover, such genomic DNA 
can be obtained in association with the native huntingtin 5* promoter region of the gene sequences and/or with 
the native huntingtin 3' transcriptional termination region. 

55 Such huntingtin genomic DNA can also be obtained in association with the genetic sequences which en- 

code the 5' non-translated region of the huntingtin mRNA and/or with the genetic sequences which encode 
the huntingtin 3' non-translated region. To the extent that a host cell can recognize the transcriptional and/or 
translational regulatory signals associated with the expression of huntingtin mRNA and protein, then the 
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5' and/or 3* non-transcribed regions of the native huntingtin gene, and/or, the 5' and/or 3' non-translated re- 
gions of the huntingtin mRNA can be retained and employed for transcriptional and translational regulation. 

Genomic DNA can be extracted and purified from any host cell, especially a human host cell possessing 
chromosome 4, by means well known in the art Genomic DNA can be shortened by means known in the art, 

5 such as physical shearing or restriction digestion, to isolate the desired huntingtin gene from a chromosomal 
region that otherwise would contain more information than necessary for the utilization of the huntingtin gene 
in the hosts of the invention. For example, restriction digestion can be utilized to cleave the full-length se- 
quence at a desired location. Alternatively, or in addition, nucleases that cleave from the 3'-end of a DNA mol- 
ecule can be used to digest a certain sequence to a shortened form, the desired length then being identified 

10 and purified by polymerase chain reaction technologies, gel electrophoresis, and DNA sequencing. Such nu- 
cleases include, for example, Exonuclease III and Bal31. Other nucleases are well known in the art. 

Alternatively, if it is known that a certain host cell population expresses huntingtin protein, then cDNA tech- 
niques known in the art can be utilized to synthesize a cDNA copy of the huntingtin mRNA present in such 
population. 

15 For cloning the genomic or cDNA nucleic acid that encodes the amino acid sequence of the huntingtin pro- 

tein into a vector, the DNA preparation can be ligated into an appropriate vector. The DNA sequence encoding 
huntingtin protein can be inserted into a DNA vector in accordance with conventional techniques, including 
blunt-ending or staggered-ending termini for ligation, restriction enzyme digestion to provide appropriate ter- 
mini, filling in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, 

20 and ligation with appropriate ligases. Techniques for such manipulations are well known in the art. 

When the huntingtin DNA coding sequence and an operably linked promoter are introduced into a recipient 
eukaryotic cell (preferably a human host cell) as a non-replicating, non- integrating, molecule, the expression 
of the encoded huntingtin protein can occur through the transient (nonstable) expression of the introduced se- 
quence. 

25 Preferably the coding sequence is introduced on a DNA molecule, such as a closed circular or linear mol- 

ecule that is capable of autonomous replication. If integration into the host chromosome is desired, it is pre- 
ferable to use a linear molecule. If stable maintenance of the huntingtin gene is desired on an extrachromo- 
somal element, then it is preferable to use a circular plasmid form, with the appropriate plasmid element for 
autonomous replication in the desired host. 

30 The desired gene construct, providing a gene coding for the huntingtin protein, and the necessary regu- 

latory elements operably linked thereto, can be introduced into a desired host cells by transformation, trans- 
fection, or any method capable of providing the construct to the host cell. A marker gene for the detection of 
a host cell that has accepted the huntingtin DNA can be on the same vector as the huntingtin DNA or on a 
separate construct for cotransformation with the huntingtin coding sequence construct into the host cell. The 

35 nature of the vector will depend on the host organism. 

Suitable selection markers will depend upon the host cell. For example, the marker can provide biocide 
resistance, e.g., resistance to antibiotics, or heavy metals, such as copper, or the like. 

Factors of importance in selecting a particular plasmid or viral vector include: the ease with which recipient 
cells that contain the vector can be recognized and selected from those recipient cells which do not contain 

40 the vector; the number of copies of the vector which are desired in a particular host; and whether it is desirable 
to be able to "shuttle" the vector between host cells of different species. 

When it is desired to use S. cerevisiae as a host for a shuttle vector, preferred S. cerevisiae yeast piasmids 
include those containing the 2-micron circle, etc., or their derivatives. Such piasmids are well known in the art 
and are commercially available. 

45 Oligonucleotide probes specific for the huntingtin sequence can be used to identify clones to huntingtin 

and can be designed de novo from the knowledge of the amino acid sequence of the protein as provided herein 
in Figure 4 or from the knowledge of the nucleic acid sequence of the DNA encoding such protein as provided 
herein in Figure 4 or of a related protein. Alternatively, antibodies can be raised against the huntingtin protein 
and used to identify the presence of unique protein determinants in transformants that express the desired 

50 cloned protein. 

A nucleic acid molecule, such as DNA, is said to be "capable of expressing" a huntingtin protein if that 
nucleic acid contains expression control sequences which contain transcriptional regulatory information and 
such sequences are "operably linked" to the huntingtin nucleotide sequence which encode the huntingtin poly- 
peptide. 

55 An operable linkage is a linkage in which a sequence is connected to a regulatory sequence (or sequences) 

in such a way as to place expression of the sequence under the influence or control of the regulatory sequence. 
If the two DNA sequences are a coding sequence and a promoter region sequence linked to the 5' end of the 
coding sequence, they are operably linked if induction of promoter function results in the transcription of mRNA 
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encoding the desired protein and if the nature of the linkage between the two DNA sequences does not (1) 
result in the introduction of a frame-shift mutation, (2) interfere with the ability of the expression regulatory 
sequences to direct the expression of the protein, antisense RNA, or (3) interfere with the ability of the DNA 
template to be transcribed. Thus, a promoter region would be operably linked to a DNAsequence if the promoter 

5 was capable of effecting transcription of that DNA sequence. 

The precise nature of the regulatory regions needed for gene expression can vary between species or 
cell types, but shall in general include, as necessary, 5' non-transcribing and 5' non-translating (non-coding) 
sequences involved with initiation of transcription and translation respectively, such as the TATA box, capping 
sequence, CAAT sequence, and the li ke, with those elements necessary for the promoter sequence being pro- 

10 vided by the promoters of the invention. Such transcriptional control sequences can also include enhancer se- 
quences or upstream activator sequences, as desired. 

The vectors of the invention can further comprise other operably linked regulatory elements such as DNA 
elements which confer antibiotic resistance, or origins of replication for maintenance of the vector in one or 
more host cells. 

15 In another embodiment, especially for maintenance of the vectors of the invention in prokaryotic cells, or 

in yeast S. cerevisiae cells, the introduced sequence is incorporated into a plasmid or viral vector capable of 
autonomous replication in the recipient host. Any of a wide variety of vectors can be employed for this purpose. 
In Bacillus hosts, integration of the desired DNA can be necessary. 

Expression of a protein in eukaryotic hosts such as a human cell requires the use of regulatory regions 

20 functional in such hosts. A wide variety of transcriptional and translational regulatory sequences can be em- 
ployed, depending upon the nature of the host. Preferably, these regulatory signals are associated in their na- 
tive state with a particular gene which is capable of a high level of expression in the specific host cell, such 
as a specific human tissue type. In eukaryotes, where transcription is not linked to translation, such control 
regions may or may not provide an initiator methionine (AUG) codon, depending on whether the cloned se- 

25 quence contains such a methionine. Such regions will, in general, include a promoter region sufficient to direct 
the initiation of RNA synthesis in the host cell. 

If desired, the non-transcribed and/or non-translated regions 3* to the sequence coding for the huntingtin 
protein can be obtained by the above-described cloning methods. The 3'-non-transcribed region of the native 
human huntingtin gene can be retained for its transcriptional termination regulatory sequence elements, or 

30 for those elements which direct polyadenylation in eukaryotic cells. Where the native expression control se- 
quences signals do not function satisfactorily in a host cell, then sequences functional in the host cell can be 
substituted. 

It may be desired to construct a fusion product that contains a partial coding sequence (usually at the amino 
terminal end) of a first protein or small peptide and a second coding sequence (partial or complete) of the hun- 

35 tingtin protein at the carboxyi end. The coding sequence of the first protein can, for example, function as a 
signal sequence for secretion of the huntingtin protein from the host cell. Such first protein can also provide 
for tissue targeting or localization of the huntingtin protein if it is to be made in one cell type in a multicellular 
organism and delivered to another cell type in the same organism. Such fusion protein sequences can be de- 
signed with or without specific protease sites such that a desired peptide sequence is amenable to subsequent 

40 removal. 

The expressed huntingtin protein can be isolated and purified from the medium of the host in accordance 
with conventional conditions, such as extraction, precipitation, chromatography, affinity chromatography, elec- 
trophoresis, or the like. For example, affinity purification with anti-huntingtin antibody can be used. A protein 
having the amino acid sequence shown in Figure 3 can be made, or a shortened peptide of this sequence can 

45 be made, and used to raised antibodies using methods well known in the art These antibodies can be used 
to affinity purify or quantitate huntingtin protein from any desired source. 

If it is necessary to extract huntingtin protein from the intracellular regions of the host cells, the host cells 
can be collected by centrif ugation, or with suitable buffers, lysed, and the protein isolated by column chroma- 
tography, for example, on DEAE-cellulose, phosphocellulose, polyribocytidylic acid-agarose, hydroxyapatite 

so or by electrophoresis or immunoprecipitation. 

II. Use Of Huntingtin For Diagnostic And Treatment Purposes 

It is to be understood that although the following discussion is specifically directed to human patients, the 
55 teachings are also applicable to any animal that expresses huntingtin and in which alteration of huntingtin, es- 
pecially the amplification of CAG repeat copy number, leads to a defect in huntingtin gene (structure or func- 
tion) or huntingtin protein (structure orfunction or expression), such that clinical manifectations such as those 
seen in Huntington's disease patients are found. 
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It is also to be understood that the methods referred to herein are applicable to any patient suspected of 
developing/having Huntington's disease, whether such condition is manifest at a young age or at a more ad- 
vanced age in the patient's life. It is also to be understood that the term "patient" does not imply that symptoms 
are present, and patient includes any individual it is desired to examine or treat using the methods of the in- 
5 vention. 

The diagnostic and screening methods of the invention are especially useful for a patient suspected of 
being at risk for developing Huntington's disease based on family history, or a patient in which it is desired to 
diagnose or eliminate the presence of the Huntington's disease condition as a causative agent behind a pa- 
tient's symptoms. 

10 It is to be understood that to the extent that a patient's symptoms arise due to the alteration of the CAG 

repeat copy numbers in the huntingtin gene, even without a diagnosis of Huntington's disease, the methods 
of the invention can identify the same as the underlying basis for such condition. 

According to the invention, presymptomatic screening of an individual in need of such screening for their 
likelihood of developing Huntington's disease is now possible using DNA encoding the huntingtin gene of the 

15 invention, and specifically, DNA having the sequence of the normal human huntingtin gene. The screening 
method of the invention allows a presymptomatic diagnosis, including prenatal diagnosis, of the presence of 
an aberrant huntingtin gene in such individuals, and thus an opinion concerning the likelihood that such indi- 
vidual would develop or has developed Huntington's disease or symptoms thereof. This is especially valuable 
for the identification of carriers of altered huntingtin gene alleles where such alleles possess an increased num- 

20 ber of CAG repeats in their huntingtin gene, for example, from individuals with a family history of Huntington's 
disease. Especially useful for the determination of the number of CAG repeats in the patient's huntingtin gene 
is the use of PCR to amplify such region or DNA blotting techniques. 

For example, in the method of screening, a tissue sample would be taken from such individual, and 
screened for (1 ) the presence of the 'normal' human huntingtin gene, especially for the presence of a "normar 

25 range of 11-34 CAG copies in such gene. The human huntingtin gene can be characterized based upon, for 
example, detection of restriction digestion patterns in 'normal' versus the patient's DNA, including RFLP ana- 
lysis, using DNA probes prepared against the huntingtin sequence (or a functional fragment thereof) taught in 
the invention. Similarly, huntingtin mRNAcan be characterized and compared to normal huntingtin mRNA(a) 
levels and/or (b) size as found in a human population not at risk of developing Huntington's disease using sim- 

30 ilar probes. Lastly, huntingtin protein can be (a) detected and/or (b) quantitated using a biological assay for 
huntingtin, for example, using an immunological assay and anti-huntingtin antibodies. When assaying hunting- 
tin protein, the immunological assay is preferred for its speed. Methods of making antibody against the hun- 
tingtin are well known in the art. 

An (1) aberrant huntingtin DNA size pattern, such as an aberrant huntingtin RFLP, and/or (2) aberrant hun- 

35 tingtin mRNA sizes or levels and/or (3) aberrant huntingtin protein levels would indicate that the patient has 
developed or is at risk for developing a huntingtin-associated symptom such as a symptom associated with 
Huntington's disease. 

The screening and diagnostic methods of the invention do not require that the entire huntingtin DNA coding 
sequence be used for the probe. Rather, it is only necessary to use a fragment or length of nucleic acid that 

40 is sufficient to detect the presence of the huntingtin gene in a DNA preparation from a normal or affected in- 
dividual, the absence of such gene, or an altered physical property of such gene (such as a change in elec- 
trophoretic migration pattern). 

Prenatal diagnosis can be performed when desired, using any known method to obtain fetal cells, including 
amniocentesis, chorionic villous sampling (CVS), and fetoscopy. Prenatal chromosome analysis can be used 

45 to determine if the portion of chromosome 4 possessing the normal huntingtin gene is present in a heterozy- 
gous state, and PCR amplification or DNA blotting utilized for estimating the size of the CAG repeat in the 
huntingtin gene. 

The huntingtin DNAcan be synthesized, especially, the CAG repeat region can be amplified and, if desired, 
labeled with a radioactive or nonradioactive reporter group, using techniques known in the art (for example, 
50 see Eckstein, F., Ed., Oligonucleotides and Analogues: A Practical Approach, IRS Press at Oxford University 
Press, New York, 1992); and Kricka, L.J., Ed., Nonisotopic DNA Probe Techniques, Academic Press, San Die- 
go, (1992)). 

In one method of treating Huntington's disease in a patient in need of such treatment, functional huntingtin 
DNA is provided to the cells of such patient, preferably prior to such symptomatic state that indicates the death 
55 of many of the patient's neuronal cells which it is desired to target with the method of the invention. The re- 
placement huntingtin DNA is provided in a manner and amount that permits the expression of the huntingtin 
protein provided by such gene, for a time and in a quantity sufficient to treat such patient. Many vector systems 
are known in the art to provide such delivery to human patients in need of a gene or protein missing from the 
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cell. For example, adenovirus or retrovirus systems can be used, especially modified retrovirus systems and 
especially herpes simplex virus systems. Such methods are provided for, in, for example, the teachings of 
Breakefield,X.A. etal., The New Biologist 3:203-218 (1991); Huang, Q. era/., Experimental Neurology 115:303- 
316 (1992), WO93/03743 and WO90/09441 each incorporated herein fully by reference. Methods of antisense 
5 strategies are known in the art (see, for example, Antisense Strategies, Baserga, R. et a/. f Eds., Annals of the 
New York Academy of Sciences, volume 660, 1992). 

In another method of treating Huntington's disease in a patient in need of such treatment, a gene encoding 
an expressible sequence that transcribes huntingtin antisense RNA is provided to the cells of such patient, 
preferably prior to such symptomatic state that indicates the death of many of the patient's neuronal cells which 
10 it is desired to target with the method of the invention. The replacement huntingtin antisense RNA gene is pro- 
vided in a manner and amount that permits the expression of the antisense RNA provided by such gene, for 
a time and in a quantity sufficient to treat such patient, and especially in an amount to inhibit translation of 
the aberrant huntingtin mRNA that is being expressed in the cells of such patient. As above, many vector sys- 
tems are known in the art to provide such delivery to human patients in need of a gene or protein which is 
15 altered in the patients' cells. For example, adenovirus or retrovirus systems can be used, especially modified 
retrovirus systems and especially herpes simplex virus systems. Such methods are provided for, in, for exam- 
ple, the teachings of Breakef ield, X.A. er a/., The New Biologist 3:203-21 8 (1 991 ); Huang, Q. et a/. f Experimental 
Neurology 775:303-316 (1992), WO93/03743 and WO90/09441 each incorporated herein fully by reference. 
Delivery of a DNA sequence encoding a functional huntingtin protein, such as the amino acid encoding 
20 sequence of Figure 4, will effectively replace the altered huntingtin gene of the invention, and inhibit, and/or 
stop and/or regress the symptoms that are the result of the interference to huntingtin gene expression due to 
an increased number of CAG repeats, such as 37 to 86 repeats in the huntingtin gene as compared to the 11- 
34 CAG repeats found in human populations not at risk for developing Huntington's disease. 

Because Huntington's disease is characterized by a loss of neurons that is most severe in the caudate 
25 and putamen regions of the brain, the method of treatment of the invention is most effective when the replace- 
ment huntingtin gene is provided to the patient early in the course of the disease, prior to the loss of many 
neurons due to cell death. For that reason, presymptomatic screening methods according to the invention are 
important in identifying those individuals in need of treatment by the method of the invention, and such treat- 
ment preferably is provided while such individual is presymptomatic. 
30 In a further method of treating Huntington's disease in a patient in need of such treatment such method 

provides an antagonist to the aberrant huntingtin protein in the cells of such patient 

Although the method is specifically described for DNA- DNA probes, it is to be understood that RNA pos- 
sessing the same sequence information as the DNA of the invention can be used when desired. 

For diagnostic assays, huntingtin antibodies are useful for quantitating and evaluating levels of huntingtin 
35 protein, and are especially useful in immunoassays and diagnostic kits. 

In another embodiment, the present invention relates to an antibody having binding affinity to an huntingtin 
polypeptide, or a binding fragment thereof. In a preferred embodiment, the polypeptide has the amino acid se- 
quence set forth in SEQ ID NO:6, or mutant or species variation thereof, or at least 7 contiguous amino acids 
thereof (preferably, at least 10, 15, 20, or 30 contiguous amino acids thereof). Those which bind selectively to 
40 huntingtin would be chosen for use in methods which could include, but should not be limited to, the analysis 
of altered huntingtin expression in tissue containing huntingtin. 

The antibodies of the present invention include monoclonal and polyclonal antibodies, as well fragments 
of these antibodies. Antibody fragments which contain the idiotype of the molecule can be generated by known 
techniques. For example, such fragments include but are not limited to: the F(ab') 2 fragment; the Fab' frag- 
45 ments. and the Fab fragments. 

Of special interest to the present invention are antibodies to huntingtin (or their functional derivatives) 
which are produced in humans, or are "humanized" (i.e. non-immunogenic in a human) by recombinant or other 
technology. Humanized antibodies may be produced, for example by replacing an immunogenic portion of an 
antibody with a corresponding, but non-immunogenic portion (i.e. chimeric antibodies) (Robinson, R.R. er a/., 
so International Patent Publication PCT/US 86/02269; Akira, K. er a/., European Patent Application 184,187; Ta- 
niguchi, M., European Patent Application 171,496; Morrison, S.L. etal. f European Patent Application 173,494; 
Neuberger, M.S. et al., PCT Application WO 86/01533; Cabilly, S. etaL, European Patent Application 125,023; 
Better, M. et al., Science 240:1041-1043 (1988); Liu, A.Y. et a/., Proc. Natl. Acad. Sci. USA 84:3439-3443 
(1987); Liu, A.Y. etal., J. Immunol. 1 39:3521-3526 (1987); Sun. L.K. etal, Proc. Natl. Acad. Sci. USA 84:214- 
55 218 (1987); Nishimura, Y et at., Cane. Res. 47:999-1005 (1987); Wood, C.R. er al., Nature 31 4:446-449 
(1985)); Shaw etal., J. Natl. Cancer Inst 80:1553-1559 (1988). General reviews of "humanized" chimeric an- 
tibodies are provided by Morrison, S.L. (Science, 229:1202-1207 (1985)) and by Oi, V.T. et a/., BioTechniques 
4:214 (1986)). Suitable "humanized" antibodies can be alternatively produced by CDR or CEA substitution 
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(Jones, P.T. etal., Nature 321:552-525 (1986); Verhoeyan etal., Science 239:1534 (1988); Beidler, C.B. et a/., 
J. Immunol. 747:4053-4060 (1988)). 

In another embodiment, the present invention relates to a hybridoma which produces the above-described 
monoclonal antibody, or binding fragment thereof. A hybridoma is an immortalized cell line which is capable 
5 of secreting a specific monoclonal antibody. 

In general, techniques for preparing monoclonal antibodies and hybridomas are well known in the art 
(Campbell, "Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and Molecular Biology" 
Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. Groth et al., J. Immunol. Methods 35:1- 
21 (1980)). 

10 Any animal (mouse, rabbit, and the like) which is known to produce antibodies can be immunized with the 

selected polypeptide. Methods for immunization are well known in the art Such methods include subcutaneous 
or interperitoneal injection of the polypeptide. One skilled in the art will recognize thatthe amount of polypeptide 
used for immunization will vary based on the animal which is immunized, the antigenicity of the polypeptide 
and the site of injection. 

15 The polypeptide may be modified or administered in an adjuvant in order to increase the peptide antige- 

nicity. Methods of increasing the antigenicity of a polypeptide are well known in the art. Such procedures in- 
clude coupling the antigen with a heterologous protein (such as globulin or p-galactosidase) or through the 
inclusion of an adjuvant during immunization. 

For monoclonal antibodies, spleen cells from the immunized animals are removed, fused with myeloma 
20 cells, and allowed to become monoclonal antibody producing hybridoma ceils. 

Any one of a number of methods well known in the art can be used to identify the hybridoma cell which 
produces an antibody with the desired characteristics. These include screening the hybridomas with an ELISA 
assay, western blot analysis, or radioimmunoassay (Lutz et al., Exp.Cell Res. 775:109-124 (1988)). 

Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using 
25 procedures known in the art (Campbell, Monoclonal Antibody Technology: Laboratory Techniques in Biochem- 
istry and Molecular Biology, supra (1 984)). 

For polyclonal antibodies, antibody containing antisera is isolated from the immunized animal and is 
screened for the presence of antibodies with the desired specificity using one of the above-described proce- 
dures. 

30 In another embodiment of the present invention, the above-described antibodies are detectably labeled. 

Antibodies can be detectably labeled through the use of radioisotopes, affinity labels (such as biotin, avidin, 
and the like), enzymatic labels (such as horse radish peroxidase, alkaline phosphatase, and the like) fluores- 
cent labels (such as FITC or rhodamine, and the like), paramagnetic atoms, and the like. Procedures for ac- 
complishing such labeling are well-known in the art, for example, see (Sternberger er al., J. Histochem. Cyto- 

35 chem. 78:315 (1970); Bayer etal., Meth. Enzym. 62:308 (1979); Engval etal., Immunol. 703:129 (1972); God- 
ing, J. Immunol. Meth. 73:215 (1976)). The labeled antibodies of the present invention can be used for in vitro, 
in vivo, and in situ assays to identify cells or tissues which express a specific peptide. 

In another embodiment of the present invention the above-described antibodies are immobilized on a solid 
support. Examples of such solid supports include plastics such as polycarbonate, complex carbohydrates such 

40 as agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. Techniques for coupling 
antibodies to such sol id supports are well known in the art (Weir et al., "Handbook of Experimental Immunology" 
4th Ed., Blackwell Scientific Publications, Oxford, England, Chapter 10 (1986); Jacoby etal, Meth. Enzym. 
34 Academic Press, N.Y. (1974)). The immobilized antibodies of the present invention can be used for/n vitro, 
in vivo, and in situ assays as well as in immunochromotography. 

45 Furthermore, one skilled in the art can readily adapt currently available procedures, as well as the tech- 

niques, methods and kits disclosed above with regard to antibodies, to generate peptides capable of binding 
to a specific peptide sequence in order to generate rationally designed antipeptide peptides, for example see 
Hurby et al., "Application of Synthetic Peptides: Antisense Peptides", In Synthetic Peptides, A User's Guide, 
W.H. Freeman, NY, pp. 289-307 (1 992), and Kaspczak et al., Biochemistry 28:9230-8 (1989). 

so Anti-peptide peptides can be generated in one of two fashions. First, the anti-peptide peptides can be gen- 

erated by replacing the basic amino acid residues found in the huntingtin peptide sequence with acidic residues, 
while maintaining hydrophobic and uncharged polar groups. For example, lysine, arginine, and/or histidine re- 
sidues are replaced with aspartic acid or glutamic acid and glutamic acid residues are replaced by lysine, ar- 
ginine or histidine. 

55 The manner and method of carrying out the present invention can be more fully understood by those of 

skill by reference to the following examples, which examples are not intended in any manner to limit the scope 
of the present invention or of the claims directed thereto. 
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Examples 

The gene causing Huntington's disease has been mapped in 4p16.3 but has previously eluded identifica- 
tion. The invention uses haplotype analysis of linkage disequilibrium to spotlight a small segment of 4p16.3 

5 as the likely location of the defect. A new gene, huntingtin (IT15), isolated using cloned "trapped" exons from 
a cosmid contig of the target area contains a polymorphic trinucleotide repeat that is expanded and unstable 
on HD chromosomes. A (CAG) n repeat longer than the normal range of about 11 to about 34 copies was ob- 
served on HD chromosomes from all 75 disease families examined, comprising a wide range of ethnic back- 
grounds and 4p16.3 haplotypes. The (CAG) n repeat, which varies from 37 to at least 86 copies on HD chro- 

10 mosomes appears to be located within the coding sequence of a predicted about 348 kDa protein that is widely 
expressed butunrelated to any known gene. Thus, the Huntington's disease mutation involves an unstable DNA 
segment, similar to those described in fragile X syndrome and myotonic dystrophy, acting in the context of a 
novel 4p16.3 gene to produce a dominant phenotype. 

The following protocols and experimental details are referenced in the examples that follow. 

15 HD Cell Lines. Lymphoblast cell lines from HD families of varied ethnic backgrounds used for genetic link- 

age and disequilibrium studies (Conneally etal., Genomics 5:304-308 (1989); MacDonald et al., Nature Genet. 
7:99-103 (1992)) have been established (Anderson and Gusella, In Vitro 20:856-858 (1984)) in the Molecular 
Neurogenetics Unit, Massachusetts General Hospital, over the past 13 years. The Venezuelan HD pedigree 
is an extended kindred of over 10,000 members in which all affected individuals have inherited the HD gene 

20 from a common founder (Gusella et al., Nature 306:234-238 (1983); Gusella et al., Science 225:1320-1326 
(1984); Wexler era/., Nature 326:194-197 (1987)). 

DNA/RNA Blotting. DNA was prepared from cultured cells and DNA blots prepared and hybridized as de- 
scribed (Gusella et al., Proc. Natl. Acad. Sci. USA 76:5239-5243 (1979); Gusella et al., Nature 306:234-238 
(1983)). RNAwas prepared and Northern blotting performed as described in Taylor etal., Nature Genet. 3:223- 

25 227(1992). 

Construction of Cosmid Contig. The initial construction of the cosmid contig was by chromosome walking 
from cosmids L19 and BJ56 (Allitto er al., Genomics 9:104-112 (1991); Lin et al., Somat. Cell Mol. Genet. 
1 7:481-488 (1991)). Two libraries were employed, a collection of Alu-positive cosmids from the reduced cell 
hybrid H39-8C1 0 (Whaley er al., Som. Cell Mol. Genet 77:83-91 (1991)) and an arrayed flow-sorted chromo- 

30 some 4 cosmid library (NM87545) provided by the Los Alamos National Laboratory. Walking was accomplished 
by hybridization of whole cosmid DNA, using suppression of repetitive and vector sequences, to robot-gener- 
ated high density filter grids (Nizetic, D. etal., Proc. Natl. Acad. Sci. USA 88:3233-3237 (1991); Lehrach, H. 
etal., in Genome Analysis: Genetic and Physical Mapping, Volume 1, Davies, K.E. etal., Ed., Cold Spring Har- 
bor Laboratory Press, 1991, pp. 39-81). Cosmids L1C2, L69F7, L228B6 and L83D3 were first identified by 

35 hybridization of YAC clone YGA2 to the same arrayed library (Bates et al., Nature Genet. 7:180-187 (1992); 
Baxendale et al., Nucleic Acids Res. 79:6651 (1991)). HD cosmid GUS72-2130 was isolated by standard 
screening of a GUS72 cosmid library using a single-copy probe. Cosmid overlaps were confirmed by a com- 
bination of clone-to-clone and clone-to-genomic hybridizations, single-copy probe hybridizations and restric- 
tion mapping. 

40 cDNA Isolation and Characterization. Exon probes were isolated and cloned as described (Buckler et al., 

Proc. Natl. Acad. Sci. USA 88:4005-4009 (1991)). Exon probes and cDNAs were used to screen human lamb- 
daZAPII cDNA libraries constructed from adult frontal cortex, fetal brain, adenovirus transformed retinal cell 
line RCA, and liver RNA. cDN A clones, PCR products and trapped exons were sequenced as described (Sang- 
er er al., Proc. Natl. Acad. Sci. USA 74:5463-5467 (1977)). Direct cosmid sequencing was performed as de- 

45 scribed (McClatchey er al., Hum. Mol. Genet. 7:521-527 (1992)). Database searches were performed using 
the BLAST network service of National Center for Biotechnology Information (Altschul et al., J. Mol. Biol. 
275:403-410(1990)). 

PCR Assay of the (CAG) n Repeat. Genomic primers (SEQ ID NO:3 and SEQ ID NO:4) flanking the (CAG) n 
repeat are: 

5' ATG AAG GCC TTC GAG TCC CTC AAG TCC TTC 3' 

and 

55 5' AAA CTC ACG GTC GGT GCA GCG GCT CCT CAG 3'. 

PCR amplification was performed in a reaction volume of 25 jjlI using 50 ng of genomic DNA, 5 \ig of each 
primer, 10 mM Tris, pH 8.3, SmM KCI, 2mM MgCI 2 , 200 \iM dNTPs, 10% DMSO, 0.1 unit Perfectmatch (Stra- 
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tagene), 2.5 uCi 32 P-dCTP (Amersham) and 1 .25 units Taq polymerase (Boehringer Mannheim). After heating 
to 94°C for 1.5 minutes, the reaction mix was cycled according to the following program: 40 X 
[1'@94°C;1 , @60°C;2'@72°C]. 5 u.l of each PCR reaction was diluted with an equal volume of 95 % formamide 
loading dye and heat denatured for 2 min. at 95°C. The products were resolved on 5 % denaturing polyacry- 
5 lamide gels. The PCR product from this reaction using cosmid L1 91 F1 (CAG 18 ) as template was 247 bp. Allele 
sizes were estimated relative to a DNA sequencing ladder, the PCR products from sequenced cosmids, and 
the invariant background bands often present on the gel. Estimates of allelic variation were obtained by typing 
unrelated individuals of largely Western European ancestry, and normal parents of affected HD individuals 
from various pedigrees. 

10 Typing ofHD and normal chromosomes in Examples 5-8. HD chromosomes were derived from sympto- 

matic individuals and "at risk" individuals known to be gene carriers by linkage marker analysis. All HD chro- 
mosomes were from members of well-characterized HD families of varied ethnic backgrounds used previously 
for genetic linkage and disequilibrium studies (MacDonald, M.E., etai., Nature Genet. 7:99-103 (1992); Con- 
neally, P.M., et a/., Genomics 5:304-308 (1989)). Three of the 150 families used were large pedigrees, each 

15 descended from a single founder. The large Venezuelan HD pedigree is an extended kindred of over 13,000 
members from which we typed 75 HD chromosomes (Gusella, J.F., et al., Nature 306:234-238 (1983); Wexler, 
N.S., et al., Nature 326:194-197 (1 987)). Two other large families that have been described previously as Fam- 
ily Z and Family D, provided 25 and 35 HD chromosomes, respectively (Folstein, S.E., etai, Science 229:776- 
779 (1985)). Normal chromosomes were taken from married-ins in the HD families and from unrelated normal 

20 individuals from non-HD families. The DNA tested for all individuals except four was prepared from lympho- 
blastoid cell lines or fresh blood (Gusella, J.F., etai., Nature 306:234-238 (1983); Anderson and Gusella, In 
Vitro 20:856-858 (1984)). In the exceptional cases, DNA was prepared from frozen cerebellum. No difference 
in the characteristics of the PCR products were observed between lymphoblasto id, fresh blood, or brain DNAs. 
For five members of the Venezuelan pedigree aged 24-30, we also prepared DNA by extracting pelleted sperm 

25 from semen samples. The length of the HD gene (CAG) n repeat for all DNAs was assessed using polymerase 
chain reaction amplification. 

Statistical analysis as set forth in Examples 5-8. Associations between repeat lengths and onset age were 
assessed by Pearson correlation coefficient and by multivariate regression to assess higher order associa- 
tions. Comparisons of the distributions of repeat length for all HD chromosomes and those for individual fam- 

30 ilies were made by analysis of variance and t-test contrasts between groups. The 95 % confidence bands were 
computed around the regression line utilizing the general linear models procedure of SAS (SAS Institute Inc., 
SAS/STAT User's Guide, Version 6, Fourth Edition, Volume 2 (SAS Institute Inc., Cary, N.C., pp. 846, 1989)). 

Example 1 

35 

Application ofExon Amplification to Obtain Trapped Cloned Exons 

The HD candidate region defined by discrete recombination events in well-characterized families spans 
2.2 Mb between D4S10 and D4S98 as shown in Figure 1. The 500 kb segment between D4S180 and D4S182 

40 displays the strongest linkage disequilibrium with HD, with about 1/3 of disease chromosomes sharing a com- 
mon haplotype, anchored by multi-allele polymorphisms at D4S1 27 and D4S95 (MacDonald etai., Nature Gen- 
et 7:99-103 (1992)). Sixty-four overlapping cosmids spanning about 480 kb from D4S180 to a location be- 
tween D4S95 and D4S182 have been isolated by a combination of information from YAC (Baxendale et al., 
Nucleic Acids Res. f9:6651 (1991)) and cosmid probe hybridization to high density filter grids of a chromosome 

45 4 specific library, as well as additional libraries covering this region. Sixteen of these cosmids providing the 
complete contig are shown in Figure 1. We have previously used exon amplification to identify ADDA, the a- 
adducin locus, IT1 0C3, a novel putative transporter gene, and IT1 1 , a novel G protein-coupled receptor kinase 
gene in the region distal to D4S127 (Figure 1). 

We have now applied the exon amplification technique to cosmids from the region of the contig proximal 

50 to D4S127. This procedure produces "trapped" exon clones, which can represent single exons, or multiple ex- 
ons spliced together and is an efficient method of obtaining probes for screening cDN A libraries. Individual cos- 
mids were processed, yielding 9 exon clones in the region from cosmids L134B9 to L181B10. 

Two non-overlapping cDNAs were initially isolated using exon probes. IT15A was obtained by screening 
a transformed adult retinal cell cDNA library with exon clone DL118F5-U. IT16A was isolated by screening an 

55 adult frontal cortex cDNA library with a pool of three exon clones, DL83D3-8, DL83D3-1, and DL228B6-3. By 
Northern blot analysis, we discovered that IT15A and IT16A are in fact different portions of the same large 
approximately 10-11 kb transcript. Figure 2 shows an example of a Northern blot containing RNAfrom lym- 
phoblastoid cell lines representing a normal individual and 2 independent homozygotes for HD chromosomes 
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of different haplotypes. The same approximately 10-11 kb transcript was also detected in RNAfrom a variety 
of human tissues (liver, spleen, kidney, muscle and various regions of adult brain). 

IT15A and IT16A were used to "walk" in a number of human tissue cDNA libraries in order to obtain the 
full-length transcript. Figure 3 shows a representation of 5 cDNA clones which define the IT1 5 transcript, under 

5 a schematic of the composite sequence derived as described in the legend. Figure 3 also displays the locations 
on the composite sequence of the 9 trapped exon clones. 

The composite sequence of IT15, containing the entire predicted coding sequence, spans 10,366 bases 
including a tail of 18 As as shown in Figure 4. An open reading frame of 9,432 bases begins with a potential 
initiator methionine codon at base 316, located in the context of an optimal translation initiation sequence. An 

10 in-frame stop codon is located 240 bases upstream from this site. The protein product of IT15 is predicted to 
be a 348 kDa protein containing 3, 144 amino acids. Although the first Met codon in the long open reading frame 
has been chosen as the probably initiator codon, we cannot exclude that translation does not actually begin 
at a more 3' Met codon, producing a smaller protein. 

15 Example 2 

Polymorphic Variation of the (CAG)„Trinucleotide Repeat 

Near its 5' end, the IT15 sequence contains 21 copies of the triplet CAG, encoding glutamine (Figure 5). 

20 When this sequence was compared with genomic sequences that are known to surround simple sequence re- 
peats (SSRs) in 4p16.3, it was found that normal cosmid L191F1 had 18 copies of the triplet indicating that 
the (CAG) n repeat is polymorphic (Figure 5). Primers from the genomic sequence flanking the repeat were 
chosen to establish a PCR assay for this variation. In the normal population, this SSR polymorphism displays 
at least 17 discrete alleles (Table 1) ranging from about 11 to about 34 repeat units. Ninety-eight percent of 

25 the 173 normal chromosomes tested contained repeat lengths between 11 and 24 repeats. Two chromosomes 
were detected in the 25-30 repeat range and 2 normal chromosomes had 33 and 34 repeats respectively. The 
overall heterozygosity on normal chromosome was 80%. Based on sequence analysis of three clones, it ap- 
pears that the variation is based entirely on the (CAG) n , but the potential for variation of the smaller downstream 
(CCG) 7 which is also included in the PCR product, is also present. 

30 

Example 3 

Instability of the Trinucleotide Repeat on HD chromosomes 

35 Sequence analysis of cosmid GUS72-2130, derived from a chromosome with the major HD haplotype (see 

below), revealed 48 copies of the trinucleotide repeat, far greater than the largest normal allele (Figure 5). When 
the PCR assay was applied to HD chromosomes, a pattern strikingly different from the normal variation was 
observed. HD heterozygotes contained one discrete allelic product in the normal size range, and one PCR prod- 
uct of much larger size, suggesting that the (CAG) n repeat on HD chromosomes is expanded relative to normal 

40 chromosomes. 

Figure 6 shows the patterns observed when the PCR assay was performed on lymphoblast DNAfrom a 
selected nuclear family in a large Venezuelan HD kindred. In this family, DNA marker analysis has shown pre- 
viously that the HD chromosome was transmitted from the father (lane 2) to seven children (lanes 3, 5, 6, 7, 
8, 10 and 11). The three normal chromosomes present in this mating yielded a PCR product in the normal size 

45 range (AN1 , AN2, AN 3) that was inherited in a Mendelian fashion. The HD chromosome in the father yielded 
a diffuse, "f uzzy'-appearing PCR product slightly smaller than the 48 repeat product of the non-Venezuelan 
HD cosmid. Except for the DNA in lane 5 which did not PCR amplify and in lane 11 which displayed only a 
single normal allele, each of the affected children's DNAs yielded a fuzzy PCR product of a different size (AE). 
indicating instability of the HD chromosome (CAG) n repeat. Lane 6 contained an HD- specific product slightly 

so smaller than or equal to that of the father's DNA. Lanes 3, 7, 10 and 8, respectively, contained HD-specific 
PCR products of progressively larger size. The absence of an HD-specific PCR product in lane 11 suggested 
that this child's DNA possessed a (CAG) n repeat that was too long to amplify efficiently. This was verified by 
Southern blot analysis in which the expanded HD allele was easily detected and estimated to contain up to 
100 copies of the repeat. Notably, this child had juvenile onset of HD at the very early age of 2 years. The 

55 onset of HD in the father was in his early 40s, typical of most adult HD patients in this population. The onset 
ages of children represented by lanes 3, 7, 10 and 8 were 26, 25, 14 and 11 years, respectively, suggesting 
a rough correlation between age at onset of HD and the length of the (CAG) n repeat on the HD chromosome. 
In keeping with this trend, the offspring represented in lane 6 with the fewest repeats remained asymptomatic 
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when last examined at age of 30. 

Figure 7 shows PCR analysis for a second sibship from the Venezuelan pedigree in which both parents 
are HD heterozygotes carrying the same HD chromosome based on DNA marker studies. Several of the off- 
spring are HD homozygotes (lanes 6+7, 10+11, 13+14, 17+18, 23+24) as reported previously (Wexler et a/., 

5 Nature 326:194-197 (1987)). Each parent's DNA contained one allele in the normal range (AN1, AN2) which 
was transmitted in a Mendelian fashion. The HD-specif ic products (AE)from the DNA of both parents and chil- 
dren were all much larger than the normal allelic products and also showed extensive variation in mean size. 
A neurologic diagnosis for the offspring in this pedigree was not provided to maintain the blind status of inves- 
tigators involved in the ongoing Venezuela HD project, although age of onset again appears to parallel repeat 

10 length. Paired samples under many of the individual symbols represent independent lymphoblast lines initiated 
at least one year apart. The variance between paired samples was not as great as between the different in- 
dividuals, suggesting that the major differences in size of the PCR products resulted from meiotic transmission. 
Of special note is the result obtained in lanes 13 and 14. This HD homozygote's DNA yielded one PCR product 
larger and one smaller than the HD-specif ic PCR products of both parents. 

15 To date, we have tested 75 independent HD families, representing all different reported in MacDonald et 

aL, Nature Genet. 1 :99-103 (1992)) and a wide range of ethnic backgrounds. In all 75 cases, a PCR product 
larger than the normal size range was produced from the HD chromosome. The sizes of the HD-specif ic prod- 
ucts ranged from 42 repeat copies to more than 66 copies, with a few individuals failing to yield a product be- 
cause of the extreme length of the repeat. In these cases, Southern blot analysis revealed an increase in the 

20 length of an EcoRI fragment with the largest allele approximating 100 copies of the repeat. Figure 8 shows 
the variation detected in members of an American family of Irish ancestry in which the major HD haplotype is 
segregating. Cosmid GUS72-2130 was cloned from the HD homozygous individual whose DNA was amplified 
in lane 2. As was observed in the Venezuelan HD pedigree (Figures 6 and 7), which segregates the disorder 
with a different 4p16.3 haplotype, the HD-specif ic PCR products for this family display considerable size va- 

25 riation. 

Example 4 

New Mutations to HD 

30 

The mutation rate in HD has been reported to be very low. To test whether the expansion of the (CAG) n 
repeat is the mechanism by which new HD mutations occur, two pedigrees with sporadic cases of HD have 
been examined in which intensive searching failed to reveal a family history of the disorder. In these cases, 
pedigree information sufficient to identify the same chromosomes in both the affected individual and unaffec- 
35 tive relatives was gathered. Figures 9 and 10 show the results of PCR analysis of the (CAG) n repeat in these 
families. The chromosomes in each family were assigned an arbitrary number based on typing for a large num- 
ber of RFLP and SSR markers in 4p16.3 defining distinct haplotypes and the presumed HD chromosome is 
starred. 

In family #1, HD first appeared in individual II-3 who transmitted the disorder to 111-1 along with chromo- 
40 some 3*. This same chromosome was present in II-2, an elderly unaffected individual. PCR analysis revealed 
that chromosome 3* from II-2 produced a PCR product at the extreme high end of the normal range (about 36 
CAG copies). However, the (CAG) n repeat on the same chromosome in II-3 and 111-1 had undergone sequential 
expansions to about 44 and about 46 copies, respectively. A similar result was obtained in Family #2, where 
the presumed HD mutant IH-2 had a considerably expanded repeat relative to the same chromosome in 11-1 
45 and 111-1 (about 49 vs. about 33 CAG copies). In both family #1 and family #2, the ultimate HD chromosome 
displays the marker haplotype characteristic of 1/3 of all HD chromosomes, suggesting that this haplotype may 
be predisposed to undergoing repeat expansion. 

Discussion 

50 

The discovery of an expanded, unstable trinucleotide repeat on HD chromosomes within the IT15 gene 
is the basis for utilizing this gene as the HD gene of the invention. These results are consistent with the inter- 
pretation that HD constitutes the latest example of a mutational mechanism that may prove quite common in 
human genetic disease. Elongation of a trinucleotide repeat sequence has been implicated previously as the 
55 cause of three quite different human disorders, the fragile X syndrome, myotonic dystrophy and spino-buibar 
muscular atrophy. The initial observations of repeat expansion in HD indicate that this phenomenon shares 
features in common with each of these disorders. 

In the fragile X syndrome, expression of a constellation of symptoms that includes mental retardation and 
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a fragile site at Xq27.3 is associated with expansion of a (CGG) n repeat thought to be in the 5' untranslated 
region of the FMR1 gene (Fu etal., Cell 67:1047-1058 (1991); Kremeref a/. f Science 252:1711-1714(1991); 
Verkerkef a/., Cell 65:904-914 (1991)). In myotonic dystrophy, a dominant disorder involving muscle weakness 
with myotonia that typically present in early adulthood, the unstable trinucleotide repeat, (CTG) n , is located in 

5 the 3' untranslated region of the mysotonin protein kinase gene (Aslanidis et al., Nature 355:548-551 (1992); 
Brook et al., Cell 68:799-808 (1992); Buxton et al., Nature 355:547-548 (1992); Fu et a/., Science 255:1256- 
1259 (1992); Harley et al., Lancet 339:1125-1128 (1992); Mahadevan etal., Science 255:1253-1255 (1992)). 
The unstable (CAG) n repeat in HD may be within the coding sequence of the IT15 gene, a feature shared with 
spino-bulbar muscular atrophy, an X-linked recessive adult-onset disorder of the motor neurons caused by ex- 

10 pansion of a (CAG) n repeat in the coding sequence of the androgen receptor gene (LaSpada et al., Nature 
352:77-79 (1 991 )). The repeat length in both the fragile X syndrome and myotonic dystrophy tends to increase 
in successive generations, sometimes quite dramatically. Occasionally, decreases in the average repeat length 
are observed (Fu etal., Science 255:1256-1259 (1992); Yu etal., Am. J. Hum. Genet. 50:968-980 (1992); Bru- 
ner et al., N. Engl. J. Med.:476-480 (1993)). The HD trinucleotide repeat is also unstable, usually expanding 

15 when transmitted to the next generation, but contracting on occasion. In HD, as in the other disorders, change 
in copy number occurs in the absence of recombination. Compared with the fragile X syndrome, myotonic dys- 
trophy, and HD, the instability of the disease allele in spino-bulbar muscular atrophy is more limited, and dra- 
matic expansions of repeat length have not been seen (Biancalana etal., Hum. Mot. Genet. 7:255-258 (1992)). 
Expansion of the repeat length in myotonic dystrophy is associated with a particular chromosomal haplo- 

20 type, suggesting the existence of a primordial predisposing mutation (Harley et al, Am. J. Hum. Genet 49:68- 
75 (1991); Harley et al, Nature 355:545-546 (1992); Ashizawa, Lancet 338:642-643 (1991); and Epstein 

(1991) ). In the fragile X syndrome, there may be a limited number of ancestral mutations that predispose to 
increases in trinucleotide repeat number (Richards et al, Nature Genet. 7:257-260 (1992); Oudet et al, Am. 
J. Hum. Genet. 52:297-304 (1993)). The linkage disequilibrium analysis used to identify IT15 indicates that 

25 there are several haplotypes associated with HD, but that at least 1/3 of HD chromosomes are ancestrally re- 
lated (MacDonald et al, Nature Genet. 7:99-103 (1992)). These data, combined with the reported low rate of 
new mutation to HD (Harper, J. Med. Genet. 89:365-376 (1992)), suggest that expansion of the trinucleotide 
repeat may only occur on select chromosomes. The analysis of two families presented herein, in which new 
mutation was supposed to have occurred, is consistent with the view that there may be particular normal chro- 

30 mosomes that have the capacity to undergo expansion of the repeat into the HD range. In each of these fam- 
ilies, a chromosome with a (CAG) n repeat length in the upper end of the normal range was segregating on a 
chromosome whose 4p16.3 haplotype matched the most common haplotype seen on HD chromosomes and 
the clinical appearance of HD in these two cases was associated with expansion of the trinucleotide repeat 
The recent application of haplotype analysis to explore the linkage disequilibrium on HD chromosomes 

35 pointed to a portion of a 2.2 Mb candidate region defined by the majority of recombination events described 
in HD pedigrees (MacDonald etal, Nature Genet 7:99-103 (1992)). Previously, the search for the gene was 
confounded by three matings in which the genetic inheritance pattern was inconsistent with the remainder of 
the family (MacDonald et al, Neuron 3:183-190 (1989b); Prichard et al, Am. J. Hum. Genet. 50:1218-1230 

(1992) ). These matings produced apparently affected HD individuals despite the inheritance of only normal 
40 alleles for markers throughout 4p1 6.3, effectively excluding inheritance of the HD chromosome present in the 

rest of the pedigree. Using PCR assay disclosed above, each of these families was tested and it was deter- 
mined that like other HD kindreds, an expanded allele segregates with HD in affected individuals of all three 
pedigrees. However, an expanded allele was not present in those specific individuals with the inconsistent 
4p16.3 genotypes. Instead, these individuals displayed the normal alleles expected based on analysis of other 

45 markers in 4p16.3. It is conceivable that these inconsistent individuals do not, in fact have HD, but some other 
disorder. Alternatively, they might represent genetic mosaics in which the HD allele is more heavily represented 
and/or more expanded in brain tissue than in the lymphoblast DNA used for genotyping. 

The capacity to monitor directly the size of the trinucleotide repeat in individuals "at risk" for HD provides 
significant advantages over current methods, eliminating the need for complicated linkage analyses, facilitat- 

50 ing genetic counseling, and extending the applicability of presymptomatic and prenatal diagnosis to "at risk" 
individuals with no living affected relatives, however, it is of the utmost importance that the current interna- 
tionally accepted guidelines and counseling protocols for testing those a at risk" continue to be observed, and 
that samples from unaffected relatives should not be tested inadvertently or without full consent In the series 
of patients examined in this study, there is an apparent correlation between repeat length and age of onset 

55 of the disease, reminiscent of that reported in myotonic dystrophy (Harley et al, Lancet 339:1125-1128(1992); 
Tsilfidis et al, Nature Genet. 7:192-195 (1992)). The largest HD trinucleotide repeat segments were found in 
juvenile onset cases, where there is a known preponderance of male transmission (Merrit et al, Excerpta Med- 
ica, Amsterdam, pp. 645-650 (1969)). 
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The expression of fragile X syndrome is associated with direct inactivation of the FMR1 gene (Pierretti et 
al, Cell 66:8 17-822 (1991); DeBoulle etai, Nature Genet. 3:31-35 (1993)). The recessive inheritance pattern 
of spino-bulbar muscular atrophy suggests that in this disorder, an inactive gene product is produced. In myo- 
tonic dystrophy, the manner in which repeat expansion leads to the dominant disease phenotype is unknown. 
5 There are numerous possibilities for the mechanism of pathogenesis of the expanded trinucleotide repeat in 
HD. Without intending to be held to this theory, nevertheless notice can be taken that since Wolf- Hirsch horn 
patients hemizygous for 4p16.3 do not display features of HD, and IT15 mRNA is present in HD homozygotes, 
the expanded trinucleotide repeat does not cause simple inactivation of the gene containing it. The observation 
that the phenotype of HD is completely dominant, since homozygotes for the disease allele do not differ cl in- 
to ically from heterozygotes, has suggested that HD results from a gain of function mutation, in which either the 
mRNA product or the protein product of the disease allele would have some new property, or be expressed 
inappropriately (Wexleref a/., Nature 326:1 94-1 97 (1987); Myersefa/.,>4/n. J. Hum. Genet. 45:615-618(1989)). 
If the expanded trinucleotide repeat were translated, the consequences on the protein product would be dra- 
matic, increasing the length of the poly-glutamine stretch near the N-terminus. It is possible, however, that de- 
15 spite the presence of an upstream Met codon, the normal translational start occurs 3' to the (CAG) n repeat 
and there is no poly-glutamine stretch in the protein product. In this case, the repeat would be in the 5' un- 
translated region and might be expected to have its dominant effect at the mRNA level. The presence of an 
expanded repeat might directly alter regulation, localization, stability or translatability of the mRNA containing 
it, and could indirectly affect its counterpart from the normal allele in HD heterozygotes. Other conceivable 
20 scenarios are that the presence of an expanded repeat might alter the effective translation start site for the 
HD transcript, thereby truncating the protein, or alter the transcription start site for the IT15 gene, disrupting 
control of mRNA expression. Finally, although the repeat is located within the IT1 5 transcript, the possibility 
that it leads to HD by virtue of an action on the expression of an adjacent gene cannot be excluded. 

Despite this final caveat, it is consistent with the above results and most likely that the trinucleotide repeat 
25 expansion causes HD by its effect, either at the mRNA or protein level, on the expression and/or structure of 
the protein product of the IT15 gene, which has been named huntingtin. Outside of the region of the triplet 
repeat, the IT15 DNA sequence detected no significant similarity to any previously reported gene in the Gen- 
Bank database. Exceptforthe stretches of glutamine and proline near the N-terminus, the amino acid sequence 
displayed no similarity to known proteins, providing no conspicuous clues to huntingtin's function. The poly- 
30 glutamine and poly-proline region near the N-terminus detect similarity with a large number of proteins which 
also contain long stretches of these amino acids. It is difficult to assess the significance of such similarities, 
although it is notable that many of these are DNA binding proteins and that huntingtin does have a single leucine 
zipper motif fat residue 1 ,443. Huntingtin appears to be widely expressed, and yet cell death in HD is confined 
to specific neurons in particular regions of the brain. 

35 
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Example 5 

30 

Distribution of Trinucleotide Repeat Lengths on Normal and HD Chromosomes 

The number of copies of the HD triplet repeat has been examined in a total of 425 HD chromosomes from 
150 independent families and compared with the copy number of the HD triplet repeat of 545 normal chromo- 

35 somes. The results are displayed in Figure 1 1 . Two non-overlapping distributions of repeat length were ob- 
served, wherein the upper end of the normal range and the lower end of the HD range were separated by 3 
repeat units. The normal chromosomes displayed 24 alleles producing PCR products ranging from 11 to 34 
repeat units, with a median of 19 units (mean 19.71, s.d. 3.21 ). The HD chromosomes yielded 54 discrete PCR 
products corresponding to repeat lengths of 37 to 86 units, with a median of 45 units (mean 46.42, s.d. 6.68). 

40 Of the HD chromosomes, 134 and 161 were known to be maternally or paternally-derived, respectively. 

To investigate whether the sex of the transmitting parent might influence the distribution of repeat lengths, 
these two sets of chromosomes were plotted separately in Figure 12. The maternally-derived chromosomes 
displayed repeat lengths ranging from 37 to 73 units, with a median of 44 (mean 44.93, s.d. 5.14). The pater- 
nally-derived chromosomes had 37 to 86 copies of the repeat unit, with a median of 48 units (mean 49.14, s.d. 

45 8.27). However, a higher proportion of the paternally-derived HD chromosomes had repeat lengths greater than 
55 units (16% vs. 2%), suggesting the possibility of a differential effect of paternal versus maternal transmis- 
sion. 

The data set used excluded chromosomes from a few clinically diagnosed individuals who have previously 
been shown not to have inherited the HD chromosome by DNA marker linkage studies (MacDonald, M.E., et 
so a/., Neuron 3: 183- 190 (1989); Pritchard, C, etal. t Am. J. Hum. Genet. 50:1218-1230 (1992)). These individuals 
have repeat lengths well within the normal range. Their disease manifestations have not been explained, and 
they may represent phenocopies of HD. Regardless of the mechanism involved, the occurrence at low frequen- 
cy of such individuals within known HD families must be considered if diagnostic conclusions are based solely 
on repeat length. 

55 The control data set also excludes a number of chromosomes from phenotypically normal individuals who 

are related to "spontaneous" cases of HD or "new mutations". Chromosomes from these individuals who are 
not clinically affected and have no family history of the disorder cannot be designated as HD. However, these 
chromosomes cannot be classified as unambiguously normal because they are essentially the same chromo- 
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some as that of an affected relative, the diagnosed "spontaneous" HD proband, except with respect to repeat 
length. The lengths of repeat found on these ambiguous chromosomes (34-38 units) span the gap between 
the control and HD distributions, confounding a decision on the status of any individual with a repeat in the 
high normal to low HD range. 

5 

Example 6 

Instability of the Trinucleotide Repeat 

10 The data in Figure 1 1 combine repeat lengths from 1 50 different HD fami lies representing many potentially 

independent origins of the defect. To examine the variation in repeat lengths on sets of HD chromosomes known 
to descend from a common founder, the data from three large HD kindreds (Gusella, J.F., et aL, Nature 306:234- 
238 (1983); Wexler, N.S., et aL, Nature 326: 194- 197 (1987); Folstein, S.E., etal. t Science 229:776-779(1985)) 
with different 4p16.3 haplotypes (MacDonald, M.E., et aL, Nature Genet 1 :99-103 (1992)), typed for 75, 25 

15 and 35 individuals, respectively, were separated. Despite the single origin of the founder HD chromosome with- 
in each pedigree, members of the separate pedigrees display a wide range of repeat lengths (Figure 1 3). This 
instability of the HD chromosome repeat is most prominent in members of a large Venezuelan HD kindred (pan- 
el A) In which the common HD ancestor has produced 10 generations of descendants, numbering over 13,000 
individuals. The distribution of repeat lengths in this sampling of the Venezuelan pedigree (median 46, mean 

20 48.26, s.d. 9.3) is not signif icantly different from that of the larger sample of HD chromosomes from all families. 
Panels B and C display results for two extended families in which HD was introduced more recently than in 
the Venezuelan kindred. These families have been reported to exhibit different age of onset distributions and 
varied phenoty pic features of HD (Folstein, S.E., et at., Science 229:776-779 (1985)). Both revealed extensive 
repeat length variation, with a median of 41 and 49 repeat units, respectively. The distribution of repeat lengths 

25 in the members of the family in Panel B was significantly different from the distribution of all HD chromosome 
repeat lengths (p<0.0001), with a smaller mean of 42.04 repeat units (s.d. 2.82). The repeat distribution from 
HD chromosomes of Panel C was also significantly different from the total data set (p<0.004), but with a higher 
mean of 49.80(s.d. 5.86). 

30 Example 7 

Parental Source Effects on Repeat Length Variation 

For 62 HD chromosomes in Figure 11, the length of the trinucleotide repeat also could be examined on 
35 the corresponding parental HD chromosome. In 20 of 25 maternal transmissions, and in 31 of 37 paternal trans- 
missions, the repeat length was altered, indicating considerable instability. Asimilar phenomenon was not ob- 
served for normal chromosomes, where more than 500 meiotic transmissions revealed no changes in repeat 
length, although the very existence of such a large number of normal alleles suggests at least a low degree, 
of instability. 

40 Figure 14 shows the relationship between the repeat lengths on the HD chromosomes in the affected par- 

ent and corresponding progeny. For the 20 maternally-inherited chromosomes on which the repeat length was 
altered, 13 changes were increases in length and 7 were decreases. Both increases and decreases involved 
changes of less than 5 repeat units and the overall correlation between the mother's repeat length and that 
of her chi Id was r=0.95 (p<0.0001 ). The average change in repeat length in the 25 maternal transmissions was 

45 an increase of 0.4 repeats. 

On paternally-derived chromosomes, the 31 transmissions in which the repeat length changes comprised 
26 length increases and 5 length decreases. Although the decreases in size were only slightly smaller than 
those observed on maternally-derived chromosomes, ranging from 1 to 3 repeat units, the increases were of- 
ten dramatically larger. Thus, the correlation of the repeat length in the father with that of his offspring was 

so only r=0.35 (p<0.04). The average change in the 37 paternal transmissions was an increase of 9 repeat units. 
The maximum length increase observed through paternal transmission was 41 repeat units, a near doubling 
of the parental repeat. 

For both male and female transmissions, there was no correlation between the size of the parental repeat 
and either the magnitude or frequency of changes. 
55 To determine whether the variation in the length of the repeat observed through male transmission of HD 

chromosomes is reflected in the male germ cells, we amplified the repeat from sperm DNA and from DNA of 
the corresponding lymphoblast from 5 HD gene carriers. The results, shown in Figure 15, reveal striking dif- 
ferences between the lymphoblast and sperm DNA for the HD chromosome repeat, but not for the repeat on 
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the normal chromosome. All the sperm donors are members of the Venezuelan HD family and range in age 
from 24 to 30 years. Individuals 1 and 2 are siblings with HD chromosome repeat lengths based on lymphoblast 
DNA of 45 and 52, respectively. Individuals 3 and 4 are also siblings, with HD repeat lengths of 46 and 49, 
respectively. Individual 5, from a different sibship than either of the other two pairs, has an HD repeat of 52 

5 copies. In all 5 cases, the PCR amplification of sperm DNA and lymphoblast DNA yielded identical products 
from the normal chromosome. However, in comparison with lymphoblast DNA, the HD gene from sperm DNA 
yielded a diffuse array of products. In 3 of the 5 cases (2,4 and 5), the diffuse array spread to much larger 
allelic products than the corresponding lymphoblast product Subject 2 showed the greatest range of expan- 
sion, with the sperm DNA product extending to over 80 repeat units. Interestingly, the 3 individuals displaying 

10 the greatest variation have the longest repeats and are currently symptomatic. The other two donors have 
shorter repeat lengths in the HD range, and remain at risk at this time. 

The striking difference in the high repeat length range (>55) between HD chromosomes transmitted from 
the father and those transmitted from the mother indicated a potential parental source effect. When this was 
examined directly, the HD chromosome repeat length changed in about 85% of transmissions. Most changes 

15 involved a fluctuation of only a few repeat units, with larger increases occurring only in male transmissions. 
The greater size increases in male transmission appear to be caused by particular instability of the HD trinu- 
cleotide repeat during male gametogenesis, based on the amplification of the repeat from sperm DNA. 

Example 8 

20 

Relationship between Repeat Length and Age of Onset 

Increased repeat length might correlate with a reduced age of onset of HD. Accordingly, age of onset data 
was determined for 234 of the individuals represented in Figure 11. Figure 16 displays the repeat lengths found 
25 on the HD and normal chromosomes of these individuals relative to their age of onset. Indeed, age of onset 
is inversely correlated with the HD repeat length. A Pearson correlation coefficient of r=-.75, p<0.0001 was 
obtained assuming a linear relationship between age of onset and repeat length. When a polynomial function 
was used, a better fit was obtained (R 2 =0.61, F=121.45), suggesting a higher order association between age 
of onset and repeat length. 

30 There is considerable variation in the age of onset associated with any specific number of repeat units, 

particularly for trinucleotide repeats in the 37-52 unit zone (88% of HD chromosomes) where onset ranged 
from 15 to 75 years. In this range, a linear relationship between age of onset and repeat length provided as 
good a fit as a higher order relationship. The 95 % confidence interval surrounding the predicted regression 
line was estimated at ±18 years. In the 37 to 52 unit range, the association of repeat length to onset age is 

35 only half as strong as in the overall distribution (r=-0.40, p<.0001), indicating that much of the predictive power 
is contributed by repeats longer than 52 units. In this increased range, onset is likely to be very young and 
consequently not relevant to most persons seeking testing. 

For the 178 cases in the 37-52 repeat unit range for which it was possible to subdivide the data set based 
on parental origin of the HD gene, multivariate regression analysis suggested a significant effect of parental 

40 origin on age of onset (p<0.05) independent of repeat length in this range. HD gene carriers from maternal 
transmissions had an average age of onset two years later than those from paternal transmissions. 

In both univariate and multivariate analyses, no association between age of onset and the repeat length 
on the normal chromosome was detected, either in the total data set, or when it was subdivided into chromo- 
somes of maternal or paternal origin. 

45 All publications mentioned hereinabove are hereby incorporated in their entirety by reference. 

While the foregoing invention has been described in some detail for purposes of clarity and understanding, 
it will be appreciated by one skilled in the art from a reading of this disclosure that various changes in form 
and detail can be made without departing from the true scope of the invention and appended claims. 
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(1) GENERAL INFORMATION: 

(i) APPLICANT: THE GENERAL HOSPITAL CORPORATION 
Fruit Street 

Boston, Massachusetts 02114 
United States of America 



(ii) TITLE OF INVENTION: Huntingtin DNA, Protein And Uses Thereof 

(iii) NUMBER OF SEQUENCES: 6 

(iv) CORRESPONDENCE ADDRESS: 
15 (A) KILBURN & STRODE 

(B) 30 JOHN STREET 

(C) LONDON 

(D) GREAT BRITAIN 

(E) WC1N 2DD 

20 (v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

{D} SOFTWARE: Patentln Release #1.0, Version #1.25 

25 (vi) CURRENT APPLICATION DATA: 

(A) 7th March 1994 



(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/085,000 

(B) FILING DATE: 01 JULY 1993 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/027,498 

(B) FILING DATE: 05 MARCH 1993 



(2) INFORMATION FOR SEQ ID NO : 1 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
40 (D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
GGCGGGAGAC CGCCATGGCG 20 
(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
AATACGACTC ACTATAG 17 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
ATGAAGGCCT TCGAGTCCCT CAAGTCCTTC 30 
(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS: single 
15 (D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

20 AAACT CACGG TCGGTGCAGC GGCTCCTCAG 3 0 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10366 base pairs 
•(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME /KEY: CDS 
30 (B) LOCATION: 316.. 9748 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

TTGCTGTGTG AG G CAGAAC C TGCGGGGGCA GGGGCGGGCT GGTTCCCTGG CCAGCCATTG 6 0 

35 GCAGAGTCCG CAGGCTAGGG CTGTCAATCA TGCTGGCCGG CGTGGCCCCG CCTCCGCCGG 12 0 

CGCGGCCCCG CCTCCGCCGG CGCACGTCTG GGACGCAAGG CGCCGTGGGG GCTGCCGGGA 180 

CGGGTCCAAG ATGGACGGCC GCTCAGGTTC TGCTTTTACC TGCGGCCCAG AGCCCCATTC 24 0 

40 ATTGCCCCGG TGCTGAGCGG CGCCGCGAGT CGGCCCGAGG CCTCCGGGGA CTGCCGTGCC 3 00 

GGGCGGGAGA CCGCC ATG GCG ACC CTG GAA AAG CTG ATG AAG GCC TTC GAG 3 51 

Met Ala Thr Leu Glu Lys Leu Met Lvs Ala Phe Glu 
1 5 1 j 

TCC CTC AAG TCC TTC CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG 3 99 

Ser Leu Lys Ser Phe Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin 
15 20 25 

CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CCG CCA CCG CCG 44 7 

Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Pro Pro Pro Pro 
30 35 40 



CCG CCG CCG CCG CCG CCT CCT CAG CTT CCT CAG CCG CCG CCG CAG GCA 4 95 

Pro Pro Pro Pro Pro Pro Pro Gin Leu Pro Gin Pro Pro Pro Gin Ala 

45 50 55 60 

CAG CCG CTG CTG CCT CAG CCG CAG CCG CCC CCG CCG CCG CCC CCG CCG 54 3 

Gin Pro Leu Leu Pro Gin Pro Gin Pro Pro Pro Pro Pro Pro Pro Pro 

65 70 75 

CCA CCC GGC CCG GCT GTG GCT GAG GAG CCG CTG CAC CGA CCA AAG AAA 5 91 
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Pro Pro Gly Pro Ala Val Ala Glu Glu Pro Leu His Arg Pro Lys Lys 
80 85 90 

GAA CTT TCA GCT ACC AAG AAA GAC CGT GTG AAT CAT TGT CTG ACA ATA 63 9 

Glu Leu Ser Ala Thr Lys Lys Asp Arg Val Asn His Cys Leu Thr lie 
95 100 105 

TGT GAA AAC ATA GTG GCA CAG TCT GTC AGA AAT TCT CCA GAA TTT CAG 68 7 

Cys Glu Asn lie Val Ala Gin Ser Val Arg Asn Ser Pro Glu Phe Gin 
110 115 120 

AAA CTT CTG GGC ATC GCT ATG GAA CTT TTT CTG CTG TGC AGT GAT GAC 73 5 

Lys Leu Leu Gly lie Ala Met Glu Leu Phe Leu Leu Cys Ser Asp Asp 
125 ~ 130 135 " 140 

GCA GAG TCA GAT GTC AGG ATG GTG GCT GAC GAA TGC CTC AAC AAA GTT 783 
Ala Glu Ser Asp Val Arg Met Val Ala Asp Glu Cys Leu Asn Lys Val 
145 150 155 

ATC AAA GCT TTG ATG GAT TCT AAT CTT CCA AGG TTA CAG CTC GAG CTC 831 
lie Lys Ala Leu Met Asp Ser Asn Leu Pro Arg Leu Gin Leu Glu Leu 
160 165 170 

20 TAT AAG GAA ATT AAA AAG AAT GGT GCC CCT CGG AGT TTG CGT GCT GCC 87 9 

Tyr Lys Glu lie Lys Lys Asn Gly Ala Pro Arg Ser Leu Arg Ala Ala 
175 180 185 

CTG TGG AGG TTT GCT GAG CTG GCT CAC CTG GTT CGG CCT CAG AAA TGC 927 

Leu Trp Arg Phe Ala Glu Leu Ala His Leu Val Arg Pro Gin Lys Cys 
190 " 195 200 

25 

AGG CCT TAC CTG GTG AAC CTT CTG CCG TGC CTG ACT CGA ACA AGC AAG 97 5 

Arg Pro Tyr Leu Val Asn Leu Leu Pro Cys Leu Thr Arg Thr Ser Lys 
205 210 215 220 

AGA CCC GAA GAA TCA GTC CAG GAG ACC TTG GCT GCA GCT GTT CCC AAA 102 3 

3 0 Arg Pro Glu Glu Ser Val Gin Glu Thr Leu Ala Ala Ala Val Pro Lys 

225 230 235 

ATT ATG GCT TCT TTT GGC AAT TTT GCA AAT GAC AAT GAA ATT AAG GTT 1071 
lie Met Ala Ser Phe Gly Asn Phe Ala Asn Asp Asn Glu lie Lys Val 
240 245 250 

35 TTG TTA AAG GCC TTC ATA GCG AAC CTG AAG TCA AGC TCC CCC ACC ATT 1119 

Leu Leu Lvs Ala Phe He Ala Asn Leu Lys Ser Ser Ser Pro Thr He 
255 260 265 

CGG CGG ACA GCG GCT GGA TCA GCA GTG AGC ATC TGC CAG CAC TCA AGA 1167 
Arg Arg Thr Ala Ala Gly Ser Ala Val Ser He Cys Gin His Ser Arg 
^ 270 275 280 

AGG ACA CAA TAT TTC TAT AGT TGG CTA CTA AAT GTG CTC TTA GGC TTA 1215 
Arg Thr Gin Tyr Phe Tyr Ser Trp Leu Leu Asn Val Leu Leu Gly Leu 
285 290 295 300 

CTC GTT CCT GTC GAG GAT GAA CAC TCC ACT CTG CTG ATT CTT GGC GTG 1263 
45 Leu Val Pro Val Glu Asp Glu His Ser Thr Leu Leu He Leu Gly Val 

305 310 315 

CTG CTC ACC CTG AGG TAT TTG GTG CCC TTG CTG CAG CAG CAG GTC AAG 1311 
Leu Leu Thr Leu Arg Tyr Leu Val Pro Leu Leu Gin Gin Gin Val Lys 
320 325 330 

50 GAC ACA AGC CTG AAA GGC AGC TTC GGA GTG ACA AGG AAA GAA ATG GAA 13 5 9 

Asp Thr Ser Leu Lys Gly Ser Phe Gly Val Thr Arg Lys Glu Met Glu 
335 340 345 

GTC TCT CCT TCT GCA GAG CAG CTT GTC CAG GTT TAT GAA CTG ACG TTA 14 07 

Val Ser Pro Ser Ala Glu Gin Leu Val Gin Val Tyr Glu Leu Thr Leu 
350 355 360 



55 



CAT CAT ACA CAG CAC CAA GAC CAC AAT GTT GTG ACC GGA GCC CTG GAG 14 55 

His His Thr Gin His Gin Asp His Asn Val Val Thr Gly Ala Leu Glu 
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365 370 375 380 

CTG TTG CAG CAG CTC TTC AGA ACG CCT CCA CCC GAG CTT CTG CAA ACC 1503 

Leu Leu Gin Gin Leu Phe Arg Thr Pro Pro Pro Glu Leu Leu Gin Thr 

385 390 395 

CTG ACC GCA GTC GGG GGC ATT GGG CAG CTC ACC GCT GCT AAG GAG GAG 1551 

Leu Thr Ala Val Gly Gly lie Gly Gin Leu Thr Ala Ala Lys Glu Glu 

400 405 410 

TCT GGT GGC CGA AGC CGT AGT GGG AGT ATT GTG GAA CTT ATA GCT GGA 1599 

Ser Gly Gly Arg Ser Arg Ser Gly Ser lie Val Glu Leu lie Ala Gly 

415 420 425 



GGG GGT TCC TCA TGC AGC CCT GTC CTT TCA AGA AAA CAA AAA GGC AAA 164 7 

Gly Gly Ser Ser Cys Ser Pro Val Leu Ser Arg Lys Gin Lys Gly Lys 
15 430 435 440 

GTG CTC TTA GGA GAA GAA GAA GCC TTG GAG GAT GAC TCT GAA TCG AGA 16 95 

Val Leu Leu Gly Glu Glu Glu Ala Leu Glu Asp Asp Ser Glu Ser Arg 
445 " 450 455 460 



TCG GAT GTC AGC AGC TCT GCC TTA ACA GCC TCA GTG AAG GAT GAG ATC 174 3 

Ser Asp Val Ser Ser Ser Ala Leu Thr Ala Ser Val Lys Asp Glu lie 
465 470 475 

AGT GGA GAG CTG GCT GCT TCT TCA GGG GTT TCC ACT CCA GGG TCA GCA 1791 
Ser Gly Glu Leu Ala Ala Ser Ser Gly Val Ser Thr Pro Gly Ser Ala 
480 485 490 

GGT CAT GAC ATC ATC ACA GAA CAG CCA CGG TCA CAG CAC ACA CTG CAG 183 9 

Gly His Asp lie lie Thr Glu Gin Pro Arg Ser Gin His Thr Leu Gin 
495 500 505 

GCG GAC TCA CTG GAT CTG GCC AGC TGT GAC TTG ACA AGC TCT GCC ACT 188 7 

Ala Asp Ser Leu Asp Leu Ala Ser Cys Asp Leu Thr Ser Ser Ala Thr 
510 515 520 

GAT GGG GAT GAG GAG GAT ATC TTG AGC CAC AGC TCC AGC CAG GTC AGC 193 5 

Asd Gly Asp Glu Glu Asp lie Leu Ser His Ser Ser Ser Gin Val Ser 
525 530 535 540 

35 GCC GTC CCA TCT GAC CCT GCC ATG GAC CTG AAT GAT GGG ACC CAG GCC 198 3 

Ala Val Pro Ser Asp Pro Ala Met Asp Leu Asn Asp Gly Thr Gin Ala 
545 550 555 

TCG TCG CCC ATC AGC GAC AGC TCC CAG ACC ACC ACC GAA GGG CCT GAT 2 031 

Ser Ser Pro He Ser Aso Ser Ser Gin Thr Thr Thr Glu Gly Pro Asp 
^ 560 565 570 

TCA GCT GTT ACC CCT TCA GAC AGT TCT GAA ATT GTG TTA GAC GGT ACC 2 07 9 

Ser Ala Val Thr Pro Ser Asp Ser Ser Glu He Val Leu Asp Gly Thr 
575 580 585 

GAC AAC CAG TAT TTG GGC CTG CAG ATT GGA CAG CCC CAG GAT GAA GAT 2127 
45 Asp Asn Gin Tyr Leu Gly Leu Gin He Gly Gin Pro Gin Asp Glu Asp 
590 595 600 

GAG GAA GCC ACA CGT ATT CTT CCT CAT CAA CCC TCC CAG CCC TTC ACG 217 5 

Glu Glu Ala Thr Gly He Leu Pro Asp Glu Ala Ser Glu Ala Phe Arg 
605 610 615 620 

50 AAC TCT TCC ATG GCC CTT CAA CAG GCA CAT TTA TTG AAA AAC ATG AGT 2223 

Asn Ser Ser Met Ala Leu Gin Gin Ala His Leu Leu Lys Asn Met Ser 
625 630 635 

CAC TGC AGG CAG CCT TCT GAC AGC AGT GTT GAT AAA TTT GTG TTG AGA 2271 
His Cys Arg Gin Pro Ser Asp Ser Ser Val Asp Lys Phe Val Leu Arg 
55 640 645 650 

GAT GAA GCT ACT GAA CCG GGT GAT CAA GAA AAC AAG CCT TGC CGC ATC 2319 
Asp Glu Ala Thr Glu Pro Gly Asp Gin Glu Asn Lys Pro Cys Arg He 
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AAA GGT GAC ATT GGA CAG TCC ACT GAT GAT GAC TCT GCA CCT CTT GTC 23 67 

Lys Gly Asp lie Gly Gin Ser Thr Asp Asp Asp Ser Ala Pro Leu Val 

€70 675 680 

CAT TCT GTC CGC CTT TTA TCT GCT TCG TTT TTG CTA ACA GGG GGA AAA 2415 

His Ser Val Arg Leu Leu Ser Ala Ser Phe Leu Leu Thr Gly Gly Lys 

£ 85 690 695 700 

AAT GTG CTG GTT CCG GAC AGG GAT GTG AGG GTC AGC GTG AAG GCC CTG 2463 

Asn Val Leu Val Pro Asp Arg Asp Val Arg Val Ser Val Lys Ala Leu 

705 710 715 



GCC CTC AGC TGT GTG GGA GCA GCT GTG GCC CTC CAC CCG GAA TCT TTC 2511 
Ala Leu Ser Cys Val Gly Ala Ala Val Ala Leu His Pro Glu Ser Phe 
15 720 725 730 

TTC AGC AAA CTC TAT AAA GTT CCT CTT GAC ACC ACG GAA TAC CCT GAG 2559 
Phe Ser Lys Leu Tyr Lys Val Pro Leu Asp Thr Thr Glu Tyr Pro Glu 
735 740 745 



GAA CAG TAT GTC TCA GAC ATC TTG AAC TAC ATC GAT CAT GGA GAC CCA 2 6 07 

Glu Gin Tyr Val Ser Asp lie Leu Asn Tyr He Asp His Gly Asp Pro 
750 755 760 

CAG GTT CGA GGA GCC ACT GCC ATT CTC TGT GGG ACC CTC ATC TGC TCC 26 55 

Gin Val Arg Gly Ala Thr Ala He Leu Cys Gly Thr Leu He Cys Ser 
765 770 775 780 

ATC CTC AGC AGG TCC CGC TTC CAC GTG GGA GAT TGG ATG GGC ACC ATT 2 703 

He Leu Ser Arg Ser Arg Phe His Val Gly Asp Trp Met Gly Thr He 
785 790 795 

AGA ACC CTC ACA GGA AAT ACA TTT TCT TTG GCG GAT TGC ATT CCT TTG 2751 
Arg Thr Leu Thr Gly Asn Thr Phe Ser Leu Ala Asp Cys He Pro Leu 
800 805 810 

CTG CGG AAA ACA CTG AAG GAT GAG TCT TCT GTT ACT TGC AAG TTA GCT 2 7 99 

Leu Arg Lys Thr Leu Lys Asp Glu Ser Ser Val Thr Cys Lys Leu Ala 
815 820 825 

35 TGT ACA GCT GTG AGG AAC TGT GTC ATG AGT CTC TGC AGC AGC AGC TAC 284 7 

Cys Thr Ala Val Arg Asn Cys Val Met Ser Leu Cys Ser Ser Ser Tyr 
830 835 840 

AGT GAG TTA GGA CTG CAG CTG ATC ATC GAT GTG CTG ACT CTG AGG AAC 28 95 

Ser Glu Leu Gly Leu Gin Leu He He Asp Val Leu Thr Leu Arg Asn 
40 8 4 5 8 5 0 8 5 5 3 6 0 

AGT TCC TAT TGG CTG GTG AGG ACA GAG CTT CTG GAA ACC CTT GCA GAG 2 94 3 

Ser Ser Tyr Trp Leu Val Arg Thr Glu Leu Leu Glu Thr Leu Ala Glu 
865 870 875 

ATT GAC TTC AGG CTG GTG AGC TTT TTG GAG GCA AAA GCA GAA AAC TTA 2 9 91 

45 He Asp Phe Arg Leu Val Ser Phe Leu Glu Ala Lys Ala Glu Asn Leu 
880 885 890 

CAC AGA GGG GCT CAT CAT TAT ACA GGG CTT TTA AAA CTG CAA GAA CGA 3 03 9 

Kis Arg Gly Ala His His Tyr Thr Gly Leu Leu Lys Leu Gin Glu Arg 
895 900 905 

50 

GTG CTC AAT AAT GTT GTC ATC CAT TTG CTT GGA GAT GAA GAC CCC AGG 3 087 

Val Leu Asn Asn Val Val He His Leu Leu Gly Asp Glu Asp Pro Arg 
910 915 " 920 

GTG CGA CAT GTT GCC GCA GCA TCA CTA ATT AGG CTT GTC CCA AAG CTG 313 5 

Val Arg His Val Ala Ala Ala Ser Leu He Arg Leu Val Pro Lys Leu 
55 925 930 935 940 

TTT TAT AAA TGT GAC CAA GGA CAA GCT GAT CCA GTA GTG GCC GTG GCA 3183 
Phe Tyr Lys Cys Asp Gin Gly Gin Ala Asp Pro Val Val Ala Val Ala 
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AGA GAT CAA AGC AGT GTT TAC CTG AAA CTT CTC ATG CAT GAG ACG CAG 3231 
Arg Asp Gin Ser Ser Val Tyr Leu Lys Leu Leu Met His Glu Thr Gin 
960 965 970 

CCT CCA TCT CAT TTC TCC GTC AGC ACA ATA ACC AGA ATA TAT AGA GGC 3 279 

Pro Pro Ser His Phe Ser Val Ser Thr He Thr Arg He Tyr Arg Gly 
975 930 985 

TAT AAC CTA CTA CCA AGC ATA ACA GAC GTC ACT ATG GAA AAT AAC CTT 3 327 

Tyr Asn Leu Leu Pro Ser He Thr Asp Val Thr Met Glu Asn Asn Leu 
990 995 1000 

TCA AGA GTT ATT GCA GCA GTT TCT CAT GAA CTA ATC ACA TCA ACC ACC 3 3 75 

Ser Arg Val He Ala Ala Val Ser His Glu Leu He Thr Ser Thr Thr 
15 1005 1010 1015 1020 

AGA GCA CTC ACA TTT GGA TGC TGT GAA GCT TTG TGT CTT CTT TCC ACT 34 2 3 

Arg Ala Leu Thr Phe Gly Cys Cys Glu Ala Leu Cys Leu Leu Ser Thr 
1025 1030 1035 

GCC TTC CCA GTT TGC ATT TGG AGT TTA GGT TGG CAC TGT GGA GTG CCT 3471 
Ala Phe Pro Val Cys He Trp Ser Leu Gly Trp His Cys Gly Val Pro 
1040 1045 1050 

CCA CTG AGT GCC TCA GAT GAG TCT AGG AAG AGC TGT ACC GTT GGG ATG 3519 
Pro Leu Ser Ala Ser Asp Glu Ser Arg Lys Ser Cys Thr Val Gly Met 
1055 1060 1065 

GCC ACA ATG ATT CTG ACC CTG CTC TCG TCA GCT TGG TTC CCA TTG GAT 3 567 

Ala Thr Met He Leu Thr Leu Leu Ser Ser Ala Trp Phe Pro Leu Asp 
1070 1075 1080 

CTC TCA GCC CAT CAA GAT GCT TTG ATT TTG GCC GGA AAC TTG CTT GCA 3 615 

™ Leu Ser Ala His Gin Asp Ala Leu He Leu Ala Gly Asn Leu Leu Ala 
30 1085 1090 1095 1100 

GCC AGT GCT CCC AAA TCT CTG AGA AGT TCA TGG GCC TCT GAA GAA GAA 3 663 

Ala Ser Ala Pro Lys Ser Leu Arg Ser Ser Trp Ala Ser Glu Glu Glu 
1105 H10 1H5 

35 GCC AAC CCA GCA GCC ACC AAG CAA GAG GAG GTC TGG CCA GCC CTG GGG 3 711 

Ala Asn Pro Ala Ala Thr Lys Gin Glu Glu Val Trp Pro Ala Leu Gly 
1120 H25 H30 

GAC CGG GCC CTG GTG CCC ATG GTG GAG CAG CTC TTC TCT CAC CTG CTG 3 75 9 

Asd Arg *la Leu Val Pro Met Val Glu Gin Leu Phe Ser His Leu Leu 
1135 H40 1145 

40 

AAG GTG ATT AAC ATT TGT GCC CAC GTC CTG GAT GAC GTG GCT CCT GGA 3 807 

Lys Val He Asn lie Cys Ala His Val Leu Asp Asp Val Ala Pro Gly 
1150 H55 H60 

CCC GCA ATA AAG GCA GCC TTG CCT TCT CTA ACA AAC CCC CCT TCT CTA 3 855 

45 Pro Ala lie Lys Ala Ala Leu Pro Ser Leu Thr Asn Pro Pro Ser Leu 
1165 H70 H75 H80 

AGT CCC ATC CGA CGA AAG GGG AAG GAG AAA GAA CCA GGA GAA CAA GCA 3 9 03 

Ser Pro He Arg Arg Lys Gly Lys Glu Lys Glu Pro Gly Glu Gin Ala 
1185 H90 H95 

50 TCT GTA CCG TTG AGT CCC AAG AAA GGC AGT GAG GCC AGT GCA GCT TCT 3 951 

Ser Val Pro Leu Ser Pro Lys Lvs Gly Ser Glu Ala Ser Ala Ala Ser 
1200 1205 1210 

AGA CAA TCT GAT ACC TCA GGT CCT GTT ACA ACA AGT AAA TCC TCA TCA 3 999 

Arg Gin Ser Asp Thr Ser Gly Pro Val Thr Thr Ser Lys Ser Ser Ser 
55 1215 1220 1225 

CTG GGG AGT TTC TAT CAT CTT CCT TCA TAC CTC AGA CTG CAT GAT GTC 4 04 7 

Leu Gly Ser Phe Tyr His Leu Pro Ser Tyr Leu Arg Leu His Asp Val 
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1230 1235 1240 

CTG AAA GCT ACA CAC GCT AAC TAC AAG GTC ACG CTG GAT CTT CAG AAC 4095 
Leu Lys Ala Thr His Ala Asn Tyr Lys Val Thr Leu Asp Leu Gin Asn 
1245 1250 1255 " 1260 

AGC ACG GAA AAG TTT GGA GGG TTT CTC CGC TCA GCC TTG GAT GTT CTT 4143 
Ser Thr Glu Lys Phe Gly Gly Phe Leu Arg Ser Ala Leu Asp Val Leu 
1265 1270 1275 

TCT CAG ATA CTA GAG CTG GCC ACA CTG CAG GAC ATT GGG AAG TGT GTT 4191 
Ser Gin lie Leu Glu Leu Ala Thr Leu Gin Asp lie Gly Lys Cys Val 
1280 1285 1290 



GAA GAG ATC CTA GGA TAC CTG AAA TCC TGC TTT AGT CGA GAA CCA ATG 423 9 

Glu Glu lie Leu Gly Tyr Leu Lys Ser Cys Phe Ser Arg Glu Pro Met 
15 1295 1300 1305 

ATG GCA ACT GTT TGT GTT CAA CAA TTG TTG AAG ACT CTC TTT GGC ACA 4287 
Met Ala Thr Val Cys Val Gin Gin Leu Leu Lys Thr Leu Phe Gly Thr 
1310 1315 1320 



AAC TTG GCC TCC CAG TTT GAT GGC TTA TCT TCC AAC CCC AGC AAG TCA 43 3 5 

Asn Leu Ala Ser Gin Phe Asp Gly Leu Ser Ser Asn Pro Ser Lys Ser 
1325 1330 1335 1340 

CAA GGC CGA GCA CAG CGC CTT GGC TCC TCC AGT GTG AGG CCA GGC TTG 43 83 

Gin Gly Arg Ala Gin Arg Leu Gly Ser Ser Ser Val Arg Pro Gly Leu 
1345 1350 "* 1355 

TAC CAC TAC TGC TTC ATG GCC CCG TAC ACC CAC TTC ACC CAG GCC CTC 4431 

Tyr His Tyr Cys Phe Met Ala Pro Tyr Thr His Phe Thr Gin Ala Leu 

1360 1365 1370 

GCT GAC GCC AGC CTG AGG AAC ATG GTG CAG GCG GAG CAG GAG AAC GAC 44 79 

Ala Asp Ala Ser Leu Arg Asn Met Val Gin Ala Glu Gin Glu Asn Asp 

1375 1380 1385 

ACC TCG GGA TGG TTT GAT GTC CTC CAG AAA GTG TCT ACC CAG TTG AAG 4 527 

Thr Ser Gly Trp Phe Asp Val Leu Gin Lys Val Ser Thr Gin Leu Lys 
1390 1395 1400 

35 ACA AAC CTC ACG AGT GTC ACA AAG AAC CGT GCA GAT AAG AAT GCT ATT 4575 

Thr Asn Leu Thr Ser Val Thr Lys Asn Arg Ala Asp Lys Asn Ala lie 
1405 1410 1415 1420 



CAT AAT CAC ATT CGT TTG TTT GAA CCT CTT GTT ATA AAA GCT TTA AAA 4 623 

His Asn His He Arg Leu Phe Glu Pro Leu Val He Lys Ala Leu Lys 
1425 1430 1435 

CAG TAC ACG ACT ACA ACA TGT GTG CAG TTA CAG AAG CAG GTT TTA GAT 4 6 71 

Gin Tyr Thr Thr Thr Thr Cys Val Gin Leu Gin Lys Gin Val Leu Asp 
1440 1445 1450 

TTG CTG GCG CAG CTG GTT CAG TTA CGG GTT AAT TAC TGT CTT CTG GAT 4719 
45 Leu Leu Ala Gin Leu Val Gin Leu Arg Val Asn Tyr Cys Leu Leu Asp 
1455 1460 1465 

TCA GAT CAG GTG TTT ATT GGC TTT GTA TTG AAA CAG TTT GAA TAC ATT 4767 

Ser Asp Gin Val Phe He Gly Phe Val Leu Lys Gin Phe Glu Tyr He 
1470 1475 1480 

50 

GAA GTG GGC CAG TTC AGG GAA TCA GAG GCA ATC ATT CCA AAC ATC TTT 4 815 

Glu Val Gly Gin Phe Arg Glu Ser Glu Ala He He Pro Asn lie Phe 
1485 1490 1495 1500 

TTC TTC TTG GTA TTA CTA TCT TAT GAA CGC TAT CAT TCA AAA CAG ATC 4 863 

Phe Phe Leu Val Leu Leu Ser Tyr Glu Arg Tyr His Ser Lys Gin He 
55 1505 1510 1515 

ATT GGA ATT CCT AAA ATC ATT CAG CTC TGT GAT GGC ATC ATG GCC AGT 4 911 

lie Gly He Pro Lys He He Gin Leu Cys Asp Gly He Met Ala Ser 
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1520 1525 1530 

GGA AGG AAG GCT GTG ACA CAT GCC ATA CCG GCT CTG CAG CCC ATA GTC 4 959 

Gly Arg Lys Ala Val Thr His Ala He Pro Ala Leu Gin Pro He Val 
1535 1540 1545 

CAC GAC CTC TTT GTA TTA AGA GGA ACA AAT AAA GCT GAT GCA GGA AAA 5 0 07 

His Asp Leu Phe Val Leu Arg Gly Thr Asn Lys Ala Asp Ala Gly Lys 
1550 1555 1560 

GAG CTT GAA ACC CAA AAA GAG GTG GTG GTG TCA ATG TTA CTG AGA CTC 5 055 

Glu Leu Glu Thr Gin Lys Glu Val Val Val Ser Met Leu Leu Arg Leu 
1565 1570 1575 1580 

ATC CAG TAC CAT CAG GTG TTG GAG ATG TTC ATT CTT GTC CTG CAG CAG 5103 
He Gin Tyr His Gin Val Leu Glu Met Phe He Leu Val Leu Gin Gin 
15 1585 1590 1595 

TGC CAC AAG GAG AAT GAA GAC AAG TGG AAG CGA CTG TCT CGA CAG ATA 5151 
Cys His Lys Glu Asn Glu Asp Lys Trp Lys Arg Leu Ser Arg Gin He 
1600 1605 1610 
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GCT GAC ATC ATC CTC CCA ATG TTA GCC AAA CAG CAG ATG CAC ATT GAC 5199 
Ala Asp He He Leu Pro Met Leu Ala Lys Gin Gin Met His He Asp 
1615 1620 1625 

TCT CAT GAA GCC CTT GGA GTG TTA AAT ACA TTA TTT GAG ATT TTG GCC 524 7 

Ser His Glu Ala Leu Gly Val Leu Asn Thr Leu Phe Glu He Leu Ala 
1630 1635 1640 

CCT TCC TCC CTC CGT CCG GTA GAC ATG CTT TTA CGG AGT ATG TTC GTC 52 95 

Pro Ser Ser Leu Arg Pro Val Asp Met Leu Leu Arg Ser Met Phe Val 
1645 1650 1655 1660 

ACT CCA AAC ACA ATG GCG TCC GTG AGC ACT GTT CAA CTG TGG ATA TCG 5343 
Thr Pro Asn Thr Met Ala Ser Val Ser Thr Val Gin Leu Trp He Ser 
1665 1670 1675 

GGA ATT CTG GCC ATT TTG AGG GTT CTG ATT TCC CAG TCA ACT GAA GAT 53 91 

Gly He Leu Ala He Leu Arq Val Leu He Ser Gin Ser Thr Glu Asp 
1680 1685 1690 

35 ATT GTT CTT TCT CGT ATT CAG GAG CTC TCC TTC TCT CCG TAT TTA ATC 543 9 

He Val Leu Ser Arg He Gin Glu Leu Ser Phe Ser Pro Tyr Leu He 
1695 1700 1705 

TCC TGT ACA GTA ATT AAT AGG TTA AGA GAT GGG GAC AGT ACT TCA ACG 54 87 

Ser Cys Thr Val He Asn Arg Leu Arg Asd Gly Asp Ser Thr Ser Thr 
1710 1715 1720 

40 

CTA GAA GAA CAC AGT GAA GGG AAA CAA ATA AAG AAT TTG CCA GAA GAA 553 5 

Leu Glu Glu His Ser Glu Gly Lys Gin He Lys Asn Leu Pro Glu Glu 
1725 1730 1735 1740 

ACA TTT TCA AGG TTT CTA TTA CAA CTG GTT GGT ATT CTT TTA GAA GAC 5583 
45 Thr Phe Ser Arg Phe Leu Leu Gin Leu Val Gly He Leu Leu Glu Asp 

1745 1750 1755 

ATT GTT ACA AAA CAG CTG AAG GTG GAA ATG AGT GAG CAG CAA CAT ACT 5631 
He Val Thr Lys Gin Leu Lys Val Glu Met Ser Glu Gin Gin His Thr 
1760 1765 1770 

50 TTC TAT TGC CAG GAA CTA GGC ACA CTG CTA ATG TGT CTG ATC CAC ATC 567 9 

Phe Tyr Cys Glr. Giu Leu Gly Thr Leu Leu Met Cys Leu He His He 
1775 1780 1785 

TTC AAG TCT GGA ATG TTC CGG AGA ATC ACA GCA GCT GCC ACT AGG CTG 5727 
Phe Lys Ser Gly Met Phe Arg Arg He Thr Ala Ala Ala Thr Arg Leu 
1790 1795 1800 



55 



TTC CGC AGT GAT GGC TGT GGC GGC AGT TTC TAC ACC CTG GAC AGC TTG 5775 
Phe Arg Ser Asp Gly Cys Gly Gly Ser Phe Tyr Thr Leu Asp Ser Leu 
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1805 1810 1815 1820 

AAC TTG CGG GCT CGT TCC ATG ATC ACC ACC CAC CCG GCC CTG GTG CTG 5823 

Asn Leu Arg Ala Arg Ser Met lie Thr Thr His Pro Ala Leu Val Leu 
1825 1830 1835 

CTC TGG TGT CAG ATA CTG CTG CTT GTC AAC CAC ACC GAC TAC CGC TGG 5871 

Leu Trp Cys Gin lie Leu Leu Leu Val Asn His Thr Asp Tyr Arg Trp 
1840 1845 1850 

TGG GCA GAA GTG CAG CAG ACC CCG AAA AGA CAC AGT CTG TCC AGC ACA 5 919 

Trp Ala Glu Val Gin Gin Thr Pro Lys Arg His Ser Leu Ser Ser Thr 
1855 1860 1865 

AAG TTA CTT AGT CCC CAG ATG TCT GGA GAA GAG GAG GAT TCT GAC TTG 5967 

Lys Leu Leu Ser Pro Gin Met Ser Gly Glu Glu Glu Asp Ser Asp Leu 
15 1870 1875 " 1880 

GCA GCC AAA CTT GGA ATG TGC AAT AGA GAA ATA GTA CGA AGA GGG GCT 6015 

Ala Ala Lys Leu Gly Met Cys Asn Arg Glu lie Val Arg Arg Gly Ala 

1885 1890 1895 " " 1900 
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CTC ATT CTC TTC TGT GAT TAT GTC TGT CAG AAC CTC CAT GAC TCC GAG 6063 
Leu lie Leu Phe Cys Asp Tyr Val Cys Gin Asn Leu His Asp Ser Glu 
1905 1910 ~ 1915 

CAC TTA ACG TGG CTC ATT GTA AAT CAC ATT CAA GAT CTG ATC AGC CTT 6111 
His Leu Thr Trp Leu lie Val Asn His lie Gin Asp Leu lie Ser Leu 
1920 1925 1930 

TCC CAC GAG CCT CCA GTA CAG GAC TTC ATC AGT GCC GTT CAT CGG AAC 6159 
Ser His Glu Pro Pro Val Gin Asp Phe lie Ser Ala Val His Arg Asn 
1935 1940 1945 

TCT GCT GCC AGC GGC CTG TTC ATC CAG GCA ATT CAG TCT CGT TGT GAA 62 07 

Ser Ala Ala Ser Gly Leu Phe lie Gin Ala He Gin Ser Arg Cys Glu 
1950 1955 1960 

AAC CTT TCA ACT CCA ACC ATG CTG AAG AAA ACT CTT CAG TGC TTG GAG 62 55 

Asn Leu Ser Thr Pro Thr Met Leu Lys Lys Thr Leu Gin Cys Leu Glu 
1965 1970 1975 ~ 1980 

35 GGG ATC CAT CTC AGC CAG TCG GGA GCT GTG CTC ACG CTG TAT GTG GAC 63 03 

Glv He His Leu Ser Gin Ser Gly Ala Val Leu Thr Leu Tyr Val Asp 
1985 1990 1995 

AGG CTT CTG TGC ACC CCT TTC CGT GTG CTG GCT CGC ATG GTC GAC ATC 63 51 

Arg Leu Leu Cys Thr Pro Phe Arg Val Leu Ala Arg Met Val Asp He 
40 2000 2005 2010 

CTT GCT TGT CGC CGG GTA GAA ATG CTT CTG GCT GCA AAT TTA CAG AGC 63 99 

Leu Ala Cys Arg Arg Val Glu Met Leu Leu Ala Ala Asn Leu Gin Ser 
2015 2020 2025 
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AGC ATG GCC CAG TTG CCA ATG GAA GAA CTC AAC AGA ATC CAG GAA TAC 5447 

Ser Met Ala Gin Leu Pro Met Glu Glu Leu Asn Arg He Gin Glu Tyr 
2030 2035 2040 

CTT CAG AGC AGC GGG CTC GCT CAG AGA CAC CAA AGG CTC TAT TCC CTG 64 95 

Leu Gin Ser Ser Gly Leu Ala Gin Arg His Gin Arg Leu Tyr Ser Leu 
2045 2050 2055 2060 

CTG GAC AGG TTT CGT CTC TCC ACC ATG CAA GAC TCA CTT AGT CCC TCT 6543 

Leu Asp Arg Phe Arg Leu Ser Thr Met Gin Asp Ser Leu Ser Pro Ser 
2065 2070 2075 

CCT CCA GTC TCT TCC CAC CCG CTG GAC GGG GAT GGG CAC GTG TCA CTG 6 591 

Pro Pro Val Ser Ser His Pro Leu Asp Gly Asp Gly His Val Ser Leu 
55 2080 2085 2090 

GAA ACA GTG AGT CCG GAC AAA GAC TGG TAC GTT CAT CTT GTC AAA TCC 663 9 

Glu Thr Val Ser Pro Asp Lys Asp Trp Tyr Val His Leu Val Lys Ser 
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2095 2100 2105 

CAG TGT TGG ACC AGG TCA GAT TCT GCA CTG CTG GAA GGT GCA GAG CTG 6 68 7 

Gin Cys Trp Thr Arg Ser Asp Ser Ala Leu Leu Glu Gly Ala Glu Leu 
5 2110 2115 2120 

GTG AAT CGG ATT CCT GCT GAA GAT ATG AAT GCC TTC ATG ATG AAC TCG 67 3 5 

Val Asn Arg lie Pro Ala Glu Asp Met Asn Ala Phe Met Met Asn Ser 
2125 2130 2135 2140 

10 GAG TTC AAC CTA AGC CTG CTA GCT CCA TGC TTA AGC CTA GGG ATG AGT 678 3 

Glu Phe Asn Leu Ser Leu Leu Ala Pro Cys Leu Ser Leu Gly Met Ser 
2145 2150 2155 

GAA ATT TCT GGT GGC CAG AAG AGT GCC CTT TTT GAA GCA GCC CGT GAG 6831 
Glu lie Ser Gly Gly Gin Lys Ser Ala Leu Phe Glu Ala Ala Arg Glu 
15 2160 2165 2170 

GTG ACT CTG GCC CGT GTG AGC GGC ACC GTG CAG CAG CTC CCT GCT GTC 687 9 

Val Thr Leu Ala Arg Val Ser Gly Thr Val Gin Gin Leu Pro Ala Val 
2175 ~ 2180 2185 

CAT CAT GTC TTC CAG CCC GAG CTG CCT GCA GAG CCG GCG GCC TAC TGG 692 7 

20 His His Val Phe Gin Pro Glu Leu Pro Ala Glu Pro Ala Ala Tyr Trp 

2190 2195 2200 

AGC AAG TTG AAT GAT CTG TTT GGG GAT GCT GCA CTG TAT CAG TCC CTG 6975 
Ser Lys Leu Asn Asp Leu Phe Gly Asp Ala Ala Leu Tyr Gin Ser Leu 
2205 2210 2215 2220 

CCC ACT CTG GCC CGG GCC CTG GCA CAG TAC CTG GTG GTG GTC TCC AAA 7023 
Pro Thr Leu Ala Arg Ala Leu Ala Gin Tyr Leu Val Val Val Ser Lys 
2225 2230 2235 

CTG CCC AGT CAT TTG CAC CTT CCT CCT GAG AAA GAG AAG GAC ATT GTG 7071 
Leu Pro Ser His Leu His Leu Pro Pro Glu Lys Glu Lys Asp He Val 
30 2240 2245 2250 

AAA TTC GTG GTG GCA ACC CTT GAG GCC CTG TCC TGG CAT TTG ATC CAT 7119 
Lys Phe Val Val Ala Thr Leu Glu Ala Leu Ser Trp His Leu He His 
2255 2260 2265 
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GAG CAG ATC CCG CTG AGT CTG GAT CTC CAG GCA GGG CTG GAC TGC TGC 7167 
Glu Gin He Pro Leu Ser Leu Asp Leu Gin Ala Gly Leu Asp Cys Cys 
2270 2275 2280 

TGC CTG GCC CTG CAG CTG CCT GGC CTC TGG AGC GTG GTC TCC TCC ACA 7215 
Cys Leu Ala Leu Gin Leu Pro Gly Leu Trp Ser Val Val Ser Ser Thr 
2285 2290 2295 2300 

GAG TTT GTG ACC CAC GCC TGC TCC CTC ATC TAC TGT GTG CAC TTC ATC 72 63 

Glu Phe Val Thr His Ala Cys Ser Leu lie Tyr Cys Val His Phe He 
2305 2310 2315 

CTG GAG GCC GTT GCA GTG CAG CCT GGA GAG CAG CTT CTT AGT CCA GAA 7311 
45 Leu Glu Ala Val Ala Val Gin Pro Gly Glu Gin Leu Leu Ser Pro Glu 

2320 2325 2330 

AGA AGG ACA AAT ACC CCA AAA GCC ATC AGC GAG GAG GAG GAG GAA GTA 73 59 

Arg Arg Thr Asn Thr Pro Lvs Ala He Ser Glu Glu Glu Glu Glu Val 
2335 2340 2345 

50 GAT CCA AAC ACA CAG AAT CCT AAG TAT ATC ACT GCA GCC TGT GAG ATG 74 07 

Asp Pro Asn Thr Gin Asn Pro Lys Tyr lie Thr Ala Ala Cys Glu Met 
2350 2355 2360 

GTG GCA GAA ATG GTG GAG TCT CTG CAG TCG GTG TTG GCC TTG GGT CAT 74 5 5 

Val Ala Glu Met Val Glu Ser Leu Gin Ser Val Leu Ala Leu Gly His 
55 2365 2370 2375 2380 

AAA AGG AAT AGC GGC GTG CCG GCG TTT CTC ACG CCA TTG CTC AGG AAC 7 503 

Lys Arg Asn Ser Gly Val Pro Ala Phe Leu Thr Pro Leu Leu Arg Asn 
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2385 2390 2395 

ATC ATC ATC AGC CTG GCC CGC CTG CCC CTT GTC AAC AGC TAC ACA CGT 7 551 

lie lie lie Ser Leu Ala Arg Leu Pro Leu Val Asn Ser Tyr Thr Arg 

2400 2405 2410 

GTG CCC CCA CTG GTG TGG AAG CTT GGA TGG TCA CCC AAA CCG GGA GGG 7 59 9 

Val Pro Pro Leu Val Trp Lys Leu Gly Trp Ser Pro Lys Pro Gly Gly 

2415 2420 2425 

GAT TTT GGC ACA GCA TTC CCT GAG ATC CCC GTG GAG TTC CTC CAG GAA 764 7 

Asp Phe Gly Thr Ala Phe Pro Glu lie Pro Val Glu Phe Leu Gin Glu 

2430 2435 2440 

AAG GAA GTC TTT AAG GAG TTC ATC TAC CGC ATC AAC ACA CTA GGC TGG 7 6 95 

Lys Glu Val Phe Lys Glu Phe lie Tyr Arg lie Asn Thr Leu Gly Trp 

15 2445 2450 2455 2460 

ACC AGT CGT ACT CAG TTT GAA GAA ACT TGG GCC ACC CTC CTT GGT GTC 7743 

Thr Ser Arg Thr Gin Phe Glu Glu Thr Trp Ala Thr Leu Leu Gly Val 

2465 2470 2475 
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CTG GTG ACC- CAG CCC CTC GTG ATG GAG CAG GAG GAG AGC CCA CCA GAA 77 91 

Leu Val Thr Gin Pro Leu Val Met Glu Gin Glu Glu Ser Pro Pro Glu 
2480 2485 2490 

GAA GAC ACA GAG AGG ACC CAG ATC AAC GTC CTG GCC GTG CAG GCC ATC 783 9 

Glu Asp Thr Glu Arg Thr Gin lie Asn Val Leu Ala Val Gin Ala lie 
2495 2500 2505 

ACC TCA CTG GTG CTC AGT GCA ATG ACT GTG CCT GTG GCC GGC AAC CCA 78 8 7 

Thr Ser Leu Val Leu Ser Ala Met Thr Val Pro Val Ala Gly Asn Pro 
2510 2515 2520 

GCT GTA AGC TGC TTG GAG CAG CAG CCC CGG AAC AAG CCT CTG AAA GCT 7 93 5 

Ala Val Ser Cys Leu Glu Gin Gin Pro Arg Asn Lys Pro Leu Lys Ala 
2525 2530 2535 2540 

CTC GAC ACC AGG TTT GGG AGG AAG CTG AGC ATT ATC AGA GGG ATT GTG 7 98 3 

Leu Asp Thr Arg Phe Gly Arg Lys Leu Ser lie lie Arg Gly lie Val 
2545 2550 2555 

35 GAG CAA GAG ATT CAA GCA ATG GTT TCA AAG AGA GAG AAT ATT GCC ACC 8 031 

Glu Gin Glu He Gin Ala Met Val Ser Lys Arg Glu Asn He Ala Thr 
2560 2565 2570 

CAT CAT TTA TAT CAG GCA TGG GAT CCT GTC CCT TCT CTG TCT CCG GCT 8 07 9 

His His Leu Tyr Gin Ala Trp Asp Fro Val Pro Ser Leu Ser Pro Ala 
^ 2575 2580 2585 

ACT ACA GGT GCC CTC ATC AGC CAC GAG AAG CTG CTG CTA CAG ATC AAC 812 7 

Thr Thr Gly Ala Leu He Ser His Glu Lys Leu Leu Leu Gin He Asn 
2590 2595 2600 

CCC GAG CGG GAG CTG GGG AGC ATG AGC TAC AAA CTC GGC CAG GTG TCC 817 5 

45 Pro Glu Arg Glu Leu Gly Ser Met Ser Tyr Lys Leu Gly Gin Val Ser 
2505 2610 2615 2620 

ATA CAC TCC GTG TGG CTG GGG AAC AGC ATC ACA CCC CTG AGG GAG GAG 8223 
He His Ser Val Trp Leu Gly Asn Ser lie Thr Pro Leu Arg Glu Glu 
2625 2630 2635 

50 GAA TGG GAC GAG GAA GAG GAG GAG GAG GCC GAC GCC CCT GCA CCT TCG 8271 

Glu Trp Asp Glu Glu Glu Glu Glu Glu Ala Asp Ala Pro Ala Pro Ser 
2640 2645 2650 

TCA CCA CCC ACG TCT CCA GTC AAC TCC AGG AAA CAC CGG GCT GGA GTT 8319 
Ser Pro Pro Thr Ser Pro Val Asn Ser Arg Lys His Arg Ala Gly Val 
55 2655 2660 2665 

GAC ATC CAC TCC TGT TCG CAG TTT TTG CTT GAG TTG TAC AGC CGC TGG 83 67 

Asp lie His Ser Cys Ser Gin Phe Leu Leu Glu Leu Tyr Ser Arg Trp 
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2670 2675 2680 

ATC CTG CCG TCC AGC TCA GCC AGG AGG ACC CCG GCC ATC CTG ATC AGT 3415 
lie Leu Pro Ser Ser Ser Ala Arg Arg Thr Pro Ala He Leu He Ser 
2585 2690 2695 2700 

GAG GTG GTC AGA TCC CTT CTA GTG GTC TCA GAC TTG TTC ACC GAG CGC 846 3 

Glu Val Val Arg Ser Leu Leu Val Val Ser Asp Leu Phe Thr Glu Arg 
2705 2710 2715 

AAC CAG TTT GAG CTG ATG TAT GTG ACG CTG ACA GAA CTG CGA AGG GTG 8511 
Asn Gin Phe Glu Leu Met Tyr Val Thr Leu Thr Glu Leu Arg Arg Val 
2720 2725 2730 

CAC CCT TCA GAA GAC GAG ATC CTC GCT CAG TAC CTG GTG CCT GCC ACC 855 9 

His Pro Ser Glu Asp Glu He Leu Ala Gin Tyr Leu Val Pro Ala Thr 
15 2735 2740 2745 

TGC AAG GCA GCT GCC GTC CTT GGG ATG GAC AAG GCC GTG GCG GAG CCT 8607 
Cys Lys Ala Ala Ala Val Leu Gly Met Asp Lys Ala Val Ala Glu Pro 
2750 2755 2760 
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GTC AGC CGC CTG CTG GAG AGC ACG CTC AGG AGC AGC CAC CTG CCC AGC 8655 
Val Ser Arg Leu Leu Glu Ser Thr Leu Arg Ser Ser His Leu Pro Ser 
2765 2770 2775 2780 

AGG GTT GGA GCC CTG CAC GGC ATC CTC TAT GTG CTG GAG TGC GAC CTG 8703 
Arg Val Gly Ala Leu His Gly He Leu Tyr Val Leu Glu Cys Asp Leu 
2785 2790 2795 

CTG GAC GAC ACT GCC AAG CAG CTC ATC CCG GTC ATC AGC GAC TAT CTC 8751 
Leu Asp Asp Thr Ala Lys Gin Leu He Pro Val He Ser Asp Tyr Leu 
28C0 2805 2810 

CTC TCC AAC CTG AAA GGG ATC GCC CAC TGC GTG AAC ATT CAC AGC CAG 87 9 9 

oo Leu Ser Asn Leu Lys Gly He Ala His Cys Val Asn He His Ser Gin 

d0 2815 2820 2825 

CAG CAC GTA CTG GTC ATG TGT GCC ACT GCG TTT TAC CTC ATT GAG AAC 8 84 7 

Gin His Val Leu Val Met Cys Ala Thr Ala Phe Tyr Leu He Glu Asn 
2830 2835 2840 

TAT CCT CTG GAC GTA GGG CCG GAA TTT TCA GCA TCA ATA ATA CAG ATG 6 8 95 

Tyr Pro Leu Asp Val Gly Pro Glu Phe Ser Ala Ser He He Gin Met 
2845 ' 2850 2855 2860 

TGT GGG GTG ATG CTG TCT GGA AGT GAG GAG TCC ACC CCC TCC ATC ATT 8943 
Cys Gly Val Met Leu Ser Gly Ser Glu Glu Ser Thr Pro Ser He lie 
2865 2870 2875 

TAC CAC TGT GCC CTC AGA GGC CTG GAG CGC CTC CTG CTC TCT GAG CAG 8 9 91 

Tyr His Cvs Ala Leu Arg Gly Leu Glu Arg Leu Leu Leu Ser Glu Gin 
2880 " 2865 2890 

CTC TCC CGC CTG GAT GCA GAA TCG CTG GTC AAG CTG AGT GTG GAC AGA 9 03 9 

Leu Ser Arg Leu Asp Ala Glu Ser Leu Val Lys Leu Ser Val Asp Arg 
2895 " 2900 2905 

GTG AAC GTG CAC AGC CCG CAC CGG GCC ATG GCG GCT CTG GGC CTG ATG 90 87 

Val Asn Val His Ser Pro His Arg Ala Met Ala Ala Leu Gly Leu Met 
2910 2915 2920 

50 CTC ACC TGC ATG TAC ACA GGA AAG GAG AAA GTC AGT CCG GGT AGA ACT 9135 

Leu Thr Cys Met Tyr Thr Gly Lys Glu Lys Val Ser Pro Gly Arg Thr 
2925 2930 2935 2940 

' TCA GAC CCT AAT CCT GCA GCC CCC GAC AGC GAG TCA GTG ATT GTT GCT 9183 
Ser Asp Pro Asn Pro Ala Ala Pro Asp Ser Glu Ser Val He Val Ala 
2945 2950 2955 

55 

ATG GAG CGG GTA TCT GTT CTT TTT GAT AGG ATC AGG AAA GGC TTT CCT 9231 
Met Glu Arg Val Ser Val Leu Phe Asp Arg He Arg Lys Gly Phe Pro 
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2960 



19GS 



2970 



TGT GAA GCC AGA GTG GTG GCC AGG ATC CTG CCC CAG TTT CTA GAC GAC 
Cys Glu Ala Arg Val Val Ala Arg lie Leu Pro Gin Phe Leu Asp Asp 
2975 2980 2985 



9279 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



TTC TTC CCA CCC CAG GAC ATC ATG AAC AAA GTC ATC GGA GAG TTT CTG 9327 
Phe Phe Pro Pro Gin Asp lie Met Asn Lys Val He Gly Glu Phe Leu 
2990 2995 3000 

TCC AAC CAG CAG CCA TAC CCC CAG TTC ATG GCC ACC GTG GTG TAT AAG 9375 
Ser Asn Gin Gin Pro Tyr Pro Gin Phe Met Ala Thr Val Val Tyr Lys 
3005 3010 3015 3020 

GTG TTT CAG ACT CTG CAC AGC ACC GGG CAG TCG TCC ATG GTC CGG GAC 9423 
Val Phe Gin Thr Leu His Ser Thr Gly Gin Ser Ser Met Val Arg Asp 
3025 3030 3035 

TGG GTC ATG CTG TCC CTC TCC AAC TTC ACG CAG AGG GCC CCG GTC GCC 9471 
Trp Val Met Leu Ser Leu Ser Asn Phe Thr Gin Arg Ala Pro Val Ala 

3040 3045 3050 

ATG GCC ACG TGG AGC CTC TCC TGC TTC TTT GTC AGC GCG TCC ACC AGC 9519 
Met Ala Thr Trp Ser Leu Ser Cys Phe Phe Val Ser Ala Ser Thr Ser 
3055 3060 3065 

CCG TGG GTC GCG GCG ATC CTC CCA CAT GTC ATC AGC AGG ATG GGC AAG 9567 
Pro Trp Val Ala Ala He Leu Pro His Val He Ser Arg Met Gly Lys 
3070 3075 3080 

CTG GAG CAG GTG GAC GTG AAC CTT TTC TGC CTG GTC GCC AGA GAC TTC 9615 
Leu Glu Gin Val Asp Val Asn Leu Phe Cvs Leu Val Ala Thr Asp Phe 
3085 3090 3095 3100 

TAC AGA CAC CAG ATA GAG GAG GAG CTC GAC CGC AGG GCC TTC CAG TCT 96 63 

Tyr Arg His Gin He Glu Glu Glu Leu Asp Arg Arg Ala Phe Gin Ser 
3105 3110 3115 

GTG CTT GAG GTG GTT GCA GCC CCA GGA AGC CCA TAT CAC CGG CTG CTG 9711 
Val Leu Glu Val Val Ala Ala Pro Gly Ser Pro Tyr His Arg Leu Leu 
3120 3125 ' 3130 

ACT TGT TTA CGA AAT GTC CAC AAG GTC ACC ACC TGC T GAGCGCCATG 9758 
Thr Cys Leu Arg Asn Val His Lys Val Thr Thr Cys 
313 5 314 0 

GTGGGAGAGA CTGTGAGGCG GCAGCTGGGG CCGGAGCCTT TGGAAGTCTG TGCCCTTGTG 9818 

CCCTGCCTCC ACCGAGCCAG CTTGGTCCCT ATGGGCTTCC GCACATGCCG CGGGCGGCCA 9878 

GGCAACGTGC GTGTCTCTGC CATGTGGCAG AAGTGCTCTT TGTGGCAGTG GCCAGGCAGG 993 8 

GAGTGTCTGC AGTCCTGGTG GGGCTGAGCC TGAGGCCTTC CAGAAAGCAG GAGCAGCTGT 999 8 

GCTGCACCCC ATGTGGGTGA CCAGGTCCTT TCTCCTGATA GTCACCTGCT GGTTGTTGCC 10058 

AGGTTGCAGC TGCTCTTGCA TCTGGGCCAG AAGTCCTCCC TCCTGCAGGC TGGCTGTTGG 10118 

CCCCTCTGCT GTCCTGCAGT AGAAGGTGCC GTGAGCAGGC TTTGGGAACA CTGGCCTGGG 10178 

TCTCCCTGGT GGGGTGTGCA TGCCACGCCC CGTGTCTGGA TGCACAGATG CCATGGCCTG 1023 8 

TGCTGGGCCA GTGGCTGGGG GTGCTAGACA CCCGGCACCA TTCTCCCTTC TCTCTTTTCT 10298 

TCTCAGGATT TAAAATTTAA TTATATCAGT AAAGAGATTA ATTTTAACGT AAAAAAAAAA 103 58 

AAAAAAAA 10366 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 3144 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

Met Ala Thr Leu Glu Lys Leu Met Lys Ala Phe Glu Ser Leu Lys Ser 
15 10 15 

Phe Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin 
20 25 30 

Gin Gin Gin Gin Gin Gin Gin Gin Pro Pro Pro Pro Pro Pro Pro Pro 
35 40 45 

15 Pro Pro Pro Gin Leu Pro Gin Pro Pro Pro Gin Ala Gin Pro Leu Leu 

50 55 60 

Pro Gin Pro Gin Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro Gly Pro 
65 70 75 80 
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Ala val Ala Glu Glu Pro Leu His Arg Pro Lys Lys Glu Leu Ser Ala 
85 90 95 

Thr Lys Lys Asp Arg Val Asn His Cys Leu Thr lie Cys Glu Asn He 
100 105 110 

Val Ala Gin Ser Val Arg Asn Ser Pro Glu Fhe Gin Lys Leu Leu Gly 
115 120 125 

He Ala Met Glu Leu Phe Leu Leu Cys Ser Asp Asp Ala Glu Ser Asp 
130 135 140 

Val Arg Met Val Ala Asp Glu Cys Leu Asn Lys Val He Lys Ala Leu 
30 145 " 150 155 160 

Met Asp Ser Asn Leu Pro Arg Leu Gin Leu Glu Leu Tyr Lys Glu He 
165 170 175 



35 



40 



45 



50 



55 



Lys Lys Asn Gly Ala Pro Arg Ser Leu Arg Ala Ala Leu Trp Arg Phe 
180 185 190 

Ala Glu Leu Ala His Leu Val Arg Pro Gin Lys Cys Arg Pro Tyr Leu 
195 200 205 

Val Asn Leu Leu Pro Cys Leu Thr Arg Thr Ser Lys Arg Pro Glu Glu 
210 215 220 

Ser Val Gin Glu Thr Leu Ala Ala Ala Val Pro Lys He Met Ala Ser 
225 230 235 240 

Phe Gly Asn Phe Ala Asn Asp Asn Glu He Lys Val Leu Leu Lys Ala 
245 250 255 

Phe He Ala Asn Leu Lys Ser Ser Ser Pro Thr He Arg Arg Thr Ala 
260 265 270 

Ala Gly Ser Ala Val Ser He Cys Gin His Ser Arg Arg Thr Gin Tyr 
275 280 285 

Phe Tyr Ser Trp Leu Leu Asn Val Leu Leu Gly Leu Leu Val Pro Val 
290 295 300 

Glu Asp Glu His Ser Thr Leu Leu He Leu Gly Val Leu Leu Thr Leu 
305 310 315 320 

Arq Tyr Leu Val Pro Leu Leu Gin Gin Gin Val Lys Asp Thr Ser Leu 
325 330 335 

Lys Gly Ser Phe Gly Val Thr Arg Lys Glu Met Glu Val Ser Pro Ser 
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340 345 350 

Ala Glu Gin Leu Val Gin Val Tyr Glu Leu Thr Leu His His Thr Gin 
355 360 365 

His Gin Asp His Asn Val Val Thr Gly Ala Leu Glu Leu Leu Gin Gin 
370 375 390 

Leu Phe Arg Thr Pro Pro Pro Glu Leu Leu Gin Thr Leu Thr Ala Val 
385 390 395 400 

Gly Gly lie Gly Gin Leu Thr Ala Ala Lys Glu Glu Ser Gly Gly Arg 
405 410 415 

Ser Arg Ser Gly Ser lie Val Glu Leu lie Ala Gly Gly Gly Ser Ser 
420 425 430 

15 Cys Ser Pro Val Leu Ser Arg Lys Gin Lys Gly Lys Val Leu Leu Gly 

435 440 445 

Glu Glu Glu Ala Leu Glu Asp Asp Ser Glu Ser Arg Ser Asp Val Ser 
450 455 460 
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Ser Ser Ala Leu Thr Ala Ser Val Lys Asp Glu lie Ser Gly Glu Leu 

465 470 475 480 

Ala Ala Ser Ser Gly Val Ser Thr Pro Gly Ser Ala Gly His Asp lie 

485 490 495 

He Thr Glu Gin Pro Arg Ser Gin His Thr Leu Gin Ala Asp Ser Leu 

25 5 0 0 5 0 5 5 1 0 

Asp Leu Ala Ser Cys Asp Leu Thr Ser Ser Ala Thr Asp Gly Asp Glu 

515 520 525 
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Glu Asp He Leu Ser His Ser Ser Ser Gin Val Ser Ala Val Pro Ser 
530 535 540 

Asp Pro Ala Met Asp Leu Asn Asp Gly Thr Gin Ala Ser Ser Pro He 
545 550 555 560 

Ser Asp Ser Ser Gin Thr Thr Thr Glu Gly Pro Asp Ser Ala Val Thr 
565 570 575 

Pro Ser Aso Ser Ser Glu He Val Leu Asp Gly Thr Asp Asn Gin Tyr 
530 585 590 

Leu Gly Leu Gin He Gly Gin Pro Gin Asp Glu Asp Glu Glu Ala Thr 
595 600 605 

Gly He Leu Pro Asp Glu Ala Ser Glu Ala Phe Arg Asn Ser Ser Met 
610 615 620 

Ala Leu Gin Gin Ala His Leu Leu Lys Asn Met Ser His Cys Arg Gin 
625 630 635 640 

Pro Ser Asp Ser Ser Val Asp Lys Phe Val Leu Arg Asp Glu Ala Thr 
645 650 655 

Glu Pro Glv Asp Gin Glu Asn Lys Pro Cys Arg He Lys Gly Asp He 
660 665 670 

50 Gly Gin Ser Thr Asp Asp Asp Ser Ala Pro Leu Val His Ser Val Arg 

675 ' 680 685 

Leu Leu Ser Ala Ser Phe Leu Leu Thr Gly Gly Lys Asn Val Leu Val 
690 695 700 
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Pro Asp Arg Asp Val Arg Val Ser Val Lys Ala Leu Ala Leu Ser Cys 

705 ~ " 710 715 720 

Val Gly Ala Ala Val Ala Leu His Pro Glu Ser Phe Phe Ser Lys Leu 

725 730 735 
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Tyr Lys Val Pro Leu Asp Thr Thr Glu Tyr Pro Glu Glu Gin Tyr Val 
740 745 750 

Ser Asp lie Leu Asn Tyr lie Asp His Gly Asp Pro Gin Val Arg Gly 
755 760 765 

Ala Thr Ala He Leu Cys Gly Thr Leu He Cys Ser He Leu Ser Arg 
770 775 780 

Ser Arg Phe His Val Gly Asp Trp Met Gly Thr He Arg Thr Leu Thr 
785 " 790 79S 800 

Gly Asn Thr Phe Ser Leu Ala Asp Cys He Pro Leu Leu Arg Lys Thr 
805 810 815 

Leu Lys Asp Glu Ser Ser Val Thr Cys Lys Leu Ala Cys Thr Ala Val 
820 825 830 

Arg Asn Cys Val Met Ser Leu Cys Ser Ser Ser Tyr Ser Glu Leu Gly 
835 840 845 

Leu Gin Leu He lie Asp Val Leu Thr Leu Arg Asn Ser Ser Tyr Trp 
850 855 860 

Leu Val Arg Thr Glu Leu Leu Glu Thr Leu Ala Glu He Asp Phe Arg 
865 870 875 880 

Leu Val Ser Phe Leu Glu Ala Lys Ala Glu Asn Leu His Arg Gly Ala 
885 890 895 

His His Tyr Thr Gly Leu Leu Lys Leu Gin Glu Arg Val Leu Asn Asn 
900 905 910 

Val Val lie His Leu Leu Gly Asp Glu Asp Pro Arg Val Arg His Val 
915 920 925 

Ala Ala Ala Ser Leu He Arg Leu Val Pro Lys Leu Phe Tyr Lys Cys 
930 935 940 

^sp Gin Gly Gin Ala Asp Pro Val Val Ala Val Ala Arg Asp Gin Ser 
945 950 955 960 

Ser Val Tyr Leu Lys Leu Leu Met His Glu Thr Gin Pro Pro Ser His 
965 970 975 

Phe Ser Val Ser Thr He Thr Arg lie Tyr Arg Gly Tyr Asn Leu Leu 
9B0 985 990 

Pro Ser He Thr Asp Val Thr Met Glu Asn Asn Leu Ser Arg Val He 
995 " 1000 1005 

Ala Ala Va~ Ser His Glu Leu lie Thr Ser Thr Thr Arg Ala Leu Thr 
1010 1015 1020 

Phe Glv Cys Cys Glu Ala Leu Cys Leu Leu Ser Thr Ala Phe Pro Val 
1025 1030 1035 1040 

Cys He Tro Ser Leu Gly Trp His Cys Gly Val Pro Pro Leu Ser Ala 
1045 1050 1055 

Ser Asp Glu Ser Arg Lys Ser Cys Thr Val Gly Met Ala Thr Met He 
1060 1065 1070 

Leu Thr Leu Leu Ser Ser Ala Trp Phe Pro Leu Asp Leu Ser Ala His 
1075 1080 1085 

Gin Asp Ala Leu lie Leu Ala Gly Asn Leu Leu Ala Ala Ser Ala Pro 
1090 1095 HOO 

Lvs Ser Leu Arg Ser Ser Trp Ala Ser Glu Glu Glu Ala Asn Pro Ala 
1105 mO HIS 1120 
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Ala Thr Lys Gin Glu Glu Val Trp Pro Ala Leu Gly Asp Arg Ala Leu 
1125 1130 J " 1135 

Val Pro Met Val Glu Gin Leu Phe Ser His Leu Leu Lys Val lie Asn 
1140 1145 1150 

lie Cys Ala His Val Leu Asp Asp Val Ala Pro Gly Pro Ala lie Lys 
1155 1160 1165 

Ala Ala Leu Pro Ser Leu Thr Asn Pro Pro Ser Leu Ser Pro lie Arg 
1170 1175 1180 

Arg Lys Gly Lys Glu Lys Glu Pro Gly Glu Gin Ala Ser Val Pro Leu 
1185 1190 1195 1200 

Ser Pro Lys Lys Gly Ser Glu Ala Ser Ala Ala Ser Arg Gin Ser Asp 
15 1205 1210 1215 

Thr Ser Gly Pro Val Thr Thr Ser Lys Ser Ser Ser Leu Gly Ser Phe 
1220 1225 1230 



10 
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Tyr His Leu Pro Ser Tyr Leu Arg Leu His Asp Val Leu Lys Ala Thr 
1235 1240 1245 

His Ala Asn Tyr Lys Val Thr Leu Asp Leu Gin Asn Ser Thr Glu Lys 
1250 1255 1260 

Phe Gly Gly Phe Leu Arg Ser Ala Leu Asp Val Leu Ser Gin He Leu 
1265 1270 1275 1280 

" Glu Leu Ala Thr Leu Gin Asp He Gly Lys Cys Val Glu Glu He Leu 
1285 1290 1295 

Gly Tyr Leu Lys Ser Cys Phe Ser Arg Glu Pro Met Met Ala Thr Val 
1300 1305 1310 

Cys Val Gin Gin Leu Leu Lys Thr Leu Phe Gly Thr Asn Leu Ala Ser 
1315 1320 1325 

Gin Phe Asp Gly Leu Ser Ser Asn Pro Ser Lys Ser Gin Gly Arg Ala 
1330 1335 1340 

Gin Arg Leu Gly Ser Ser Ser Val Arg Pro Gly Leu Tyr His Tyr Cys 
1345 1350 " 1355 1360 

Phe Met Ala Pro Tyr Thr His Phe Thr Gin Ala Leu Ala Asp Ala Ser 
1365 1370 1375 

40 Leu Arg Asn Met Val Gin Ala Glu Gin Glu Asn Asp Thr Ser Gly Trp 

1380 1385 1390 

Phe Asp Val Leu Gin Lys Val Ser Thr Gin Leu Lys Thr Asn Leu Thr 
1395 1400 1405 
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Ser Val Thr Lys Asn Arg Ala Asp Lys Asn Ala He His Asn His He 

1410 1415 1420 

Arg Leu Phe Glu Pro Leu Val He Lys Ala Leu Lys Gin Tyr Thr Thr 

1425 1430 * 1435 1440 

Thr Thr Cys Val Gin Leu Gin Lys Gin Val Leu Asp Leu Leu Ala Gin 

50 1445 1450 *" 1455 

Leu Val Gin Leu Arg val Asn Tyr Cys Leu Leu Asp Ser Asp Gin val 

1460 1465 1470 



55 



Phe He Gly Phe Val Leu Lys Gin Phe Glu Tyr He Glu Val Gly Gin 

1475 1480 1485 

Phe Arg Glu Ser Glu Ala He He Pro Asn lie Phe Phe Phe Leu Val 

1490 1495 1500 
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Leu Leu Ser Tyr Glu Arg Tyr His Ser Lys Gin lie lie Gly lie Pro 
150 5 1510 1515 1520 

Lvs He He Gin Leu Cys Asp Gly He Met Ala Ser Gly Arg Lys Ala 
1 1525 1530 1535 

Val Thr His Ala He Pro Ala Leu Gin Pro He Val His Asp Leu Phe 
— 1545 1550 



1540 



Val Leu Arg Gly Thr Asn Lys Ala Asp Ala Gly Lys Glu Leu Glu Thr 
10 1555 1560 1565 

Gin Lys Glu Val Val Val Ser Met Leu Leu Arg Leu He Gin Tyr His 
1570 1575 1580 

Gin Val Leu Glu Met Phe He Leu Val Leu Gin Gin Cys His Lys Glu 
15 1585 1590 1595 1600 

Asn Glu Asp Lys Trp Lys Arg Leu Ser Arg Gin Tie Ala Asp lie He 
1605 1610 161S 

Leu Pro Met Leu Ala Lys Gin Gin Met His He Asp Ser His Glu Ala 
1$20 1625 1630 

Leu Gly val Leu Asn Thr Leu Phe Glu He Leu Ala Pro Ser Ser Leu 
1635 1640 1645 

Arg Pro Val Asp Met Leu Leu Arg Ser Met Phe Val Thr Pro Asn Thr 
1650 * 1655 1660 

Met Ala Ser Val Ser Thr Val Gin Leu Trp lie Ser Gly lie Leu Ala 
1665 1670 1675 1680 

He Leu Arg Val Leu He Ser Gin Ser Thr Glu Asp He Val Leu Ser 
1585 1690 Ibyb 

Ara He Gin Glu Leu Ser Phe Ser Pro Tyr Leu He Ser Cys Thr Val 
rj ~ " ~ - 1705 1710 



1700 



He Asn Arg Leu Arg Asp Gly Asp Ser Thr Ser Thr Leu Glu Glu His 
1715 1720 1725 

Ser Glu Gly Lys Gin He Lys Asn Leu Pro Glu Glu Thr Phe Ser Arg 
1730 1 ? 35 I 740 

Phe Leu Leu Gin Leu Val Gly He Leu Leu Glu Asp He Val Thr Lys 
1745 1750 1755 I'*" 

Gin Leu Lys Val Glu Met Ser Glu Gin Gin His Thr Phe Tyr Cys Gin 
4u ' 1765 1770 ±//s 

Glu Leu Gly Thr Leu Leu Met Cys Leu lie Kis lie Phe Lys Ser Gly 
1780 1785 1 /so 

Met Phe Arg Arg He Thr Ala Ala Ala Thr Arg Leu Phe Arg Ser Asp 
45 1795 1800 1805 

Gly Cys Gly Gly Ser Phe Tyr Thr Leu Asp Ser Leu Asn Leu Arg Ala 
'' ' 1015 1820 



1810 



Arg Ser Met lie Thr Thr His Pro Ala Leu Val Leu Leu Trp Cys Gin 
50 1825 1830 1335 

He Leu Leu Leu Val^Asn His Thr Asp Tyj^Arg Trp Trp Ala Glu^al 

Gin Gin Thr Pro Lys Arg His Ser Leaser Ser Thr Lys Le^Leu Ser 



1860 



Pro Gin Met Ser Gly Glu Glu Glu Asp Ser Asp Leu Ala Ala Lys Leu 

1880 xood 



1875 
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Gly Met Cys Asn Arg Glu lie Val Arg Arg Gly Ala Leu lie Leu Phe 

1B90 1895 " 1900 

Cys Asp Tyr Val Cys Gin Asn Leu His Asp Ser Glu His Leu Thr Trp 

5 1905 1910 1915 1920 

Leu lie Val Asn His lie Gin Asp Leu He Ser Leu Ser His Glu Pro 

1925 1930 1935 

Pro Val Gin Asp Phe He Ser Ala Val His Arg Asn Ser Ala Ala Ser 

10 1940 1945 1950 

Gly Leu Phe He Gin Ala He Gin Ser Arg Cys Glu Asn Leu Ser Thr 

1955 1960 1965 

Pro Thr Met Leu Lys Lys Thr Leu Gin Cys Leu Glu Gly He His Leu 

15 1970 1975 1980 

Ser Gin Ser Gly Ala Val Leu Thr Leu Tyr Val Asp Arg Leu Leu Cys 

1985 1990 1995 2000 



20 
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Thr Pro Phe Arg Val Leu Ala Arg Met Val Asp He Leu Ala Cys Arg 

2005 2010 2015 

Arg val Glu Met Leu Leu Ala Ala Asn Leu Gin Ser Ser Met Ala Gin 
2020 2025 2030 

Leu Pro Met Glu Glu Leu Asn Arg He Gin Glu Tyr Leu Gin Ser Ser 
2035 2040 2045 

Gly Leu Ala Gin Arg His Gin Arg Leu Tyr Ser Leu Leu Asp Arg Phe 

2050 2055 * 2060 

Arg Leu Ser Thr Met Gin Asp Ser Leu Ser Pro Ser Pro Pro Val Ser 
2065 2070 2075 . 2080 

Ser His Pre Leu Asp Gly Asp Gly His Val Ser Leu Glu Thr Val Ser 

2085 2090 2095 

Pro Asp Lys Asp Trp Tyr Val His Leu Val Lys Ser Gin Cys Trp Thr 
2100 2105 2110 

35 Arg Ser Asp Ser Ala Leu Leu Glu Gly Ala Glu Leu Val Asn Arg He 

2115 2120 2125 

Pro Ala Glu Asp Met Asn Ala Phe Met Met Asn Ser Glu Phe Asn Leu 
2130 2135 2140 
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Ser Leu Leu Ala Pro Cys Leu Ser Leu Gly Met Ser Glu He Ser Gly 
2145 2150 2155 2160 

Gly Gin Lys Ser Ala Leu Phe Glu Ala Ala Arg Glu Val Thr Leu Ala 
2165 2170 2175 

Arg Val Ser Gly Thr Val Gin Gin Leu Pro Ala Val His His Val Phe 
45 2180 2185 2190 

Gin Pro Glu Leu Pro Ala Glu Pro Ala Ala Tyr Trp Ser Lys Leu Asn 

2195 2200 ** 2205 
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Asp Leu Phe Gly Asp Ala Ala Leu Tyr Gin Ser Leu Pro Thr Leu Ala 
2210 2215 2220 

Arg Ala Leu Ala Gin Tyr Leu Val Val Val Ser Lys Leu Pro Ser His 
2225 2230 2235 2240 

Leu His Leu Pro Pro Glu Lys Glu Lys Asp lie Val Lys Phe Val Val 
2245 2250 2255 

Ala Thr Leu Glu Ala Leu Ser Trp His Leu lie His Glu Gin He Pro 
2260 2265 2270 
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Leu Ser Leu Asp Leu Gin Ala Gly Leu Asp Cys Cys Cys Leu Ala Leu 
2275 2280 * 2285 

Gin Leu Pro Gly Leu Trp Ser Val Val Ser Ser Thr Glu Phe Val Thr 
2290 2295 2300 

5 

His Ala Cys Ser Leu lie Tyr Cys Val His Phe lie Leu Glu Ala Val 
2305 2310 2315 2320 

Ala Val Gin Pro Gly Glu Gin Leu Leu Ser Pro Glu Arg Arg Thr Asn 
10 2325 2330 2335 

Thr Pro Lys Ala lie Ser Glu Glu Glu Glu Glu Val Asp Pro Asn Thr 
2340 2345 2350 
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Gin Asn Pro Lys Tyr He Thr Ala Ala Cys Glu Met Val Ala Glu Met 
2355 2360 2365 

Val Glu Ser Leu Gin Ser Val Leu Ala Leu Gly His Lys Arg Asn Ser 
2370 2375 2380 

Gly Val Pro Ala Phe Leu Thr Pro Leu Leu Arg Asn He He lie Ser 
2385 2390 2395 2400 

Leu Ala Arg Leu Pro Leu Val Asn Ser Tyr Thr Arg Val Pro Pro Leu 
2405 2410 2415 

Val Trp Lys Leu Gly Trp Ser Pro Lys Pro Gly Gly Asp Phe Gly Thr 
2420 * 2425 2430 

Ala Phe Pro Glu He Pro Val Glu Phe Leu Gin Glu Lys Glu Val Phe 
2435 2440 2445 

Lys Glu Phe He Tyr Arg He Asn Thr Leu Gly Trp Thr Ser Arg Thr 
2450 2455 2460 

Gin Phe Glu Glu Thr Trp Ala Thr Leu Leu Gly Val Leu Val Thr Gin 
2465 2470 2475 2480 

Pro Leu Val Met Glu Gin Glu Glu Ser Pro Pro Glu Glu Asp Thr Glu 
2485 2490 2495 

35 Arg Thr Gin He Asn Val Leu Ala Val Gin Ala lie Thr Ser Leu Val 

2500 2505 2510 

Leu Ser Ala Met Thr val Pro Val Ala Gly Asn Pro Ala Val Ser Cys 

2515 2520 • 2525 
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Leu Glu Gin Gin Pro Arg Asn Lys Pro Leu Lys Ala Leu Asp Thr Arg 
2530 2535 2540 

Phe Gly Arg Lys Leu Ser He He Arg Gly He Val Glu Gin Glu He 
2545 " 2550 2555 2560 

Gin Ala Met Val Ser Lys Arg Glu Asn lie Ala Thr His His Leu Tyr 
45 2565 2570 2575 

Gin Ala Trp Asp Pro Val Pro Ser Leu Ser Pro Ala Thr Thr Gly Ala 
2580 2585 2590 
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Leu lie Ser His Glu Lys Leu Leu Leu Gin lie Asn Pro Glu Arg Glu 
2595 2600 2605 

Leu Gly Ser Met Ser Tyr Lys Leu Gly Gin Val Ser lie His Ser Val 
2610 2615 ' 2620 

Trp Leu Gly Asn Ser lie Thr Pro Leu Arg Glu Glu Glu Trp Asp Glu 
2625 2630 2635 2640 

Glu Glu Glu Glu Glu Ala Asp Ala Pro Ala Pro Ser Ser Pro Pro Thr 
2645 2650 2655 
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Ser Pro Val Asn Ser Arg Lys His Arg Ala Gly Val Asp He His Ser 
2660 2665 2670 

Cys Ser Gin Phe Leu Leu Glu Leu Tyr Ser Arg Trp He Leu Pro Ser 
5 2675 2680 2685 

Ser Ser Ala Arg Arg Thr Pro Ala He Leu He Ser Glu Val Val Arg 
2690 2695 2700 

Ser Leu Leu Val Val Ser Asp Leu Phe Thr Glu Arg Asn Gin Phe Glu 
10 2705 2710 2715 2720 

Leu Met Tyr Val Thr Leu Thr Glu Leu Arg Arg Val His Pro Ser Glu 
2725 2730 2735 

Asp Glu He Leu Ala Gin Tyr Leu Val Pro Ala Thr Cys Lys Ala Ala 
15 2740 2745 2750 

Ala Val Leu Gly Met Asp Lys Ala Val Ala Glu Pro Val Ser Arg Leu 
2755 2760 2765 
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Leu Glu Ser Thr Leu Arg Ser Ser His Leu Pro Ser Arg Val Gly Ala 
2770 2775 2780 

Leu His Gly lie Leu Tyr Val Leu Glu Cys Asp Leu Leu Asp Asp Thr 
2785 2790 2795 2800 

Ala Lys Gin Leu He Pro Val He Ser Asp Tyr Leu Leu Ser Asn Leu 
2805 2810 2815 

Lys Gly He Ala His Cys Val Asn He His Ser Gin Gin His Val Leu 
2820 2825 2830 

Val Met cys Ala Thr Ala Phe Tyr Leu He Glu Asn Tyr Pro Leu Asp 
2835 2840 2845 

Val Gly Pro Glu Phe Ser Ala Ser He He Gin Met Cys Gly Val Met 
2850 2855 2860 

Leu Ser Gly Ser Glu Glu Ser Thr Pro Ser lie lie Tyr His Cys Ala 
2865 2870 2875 "* 2880 

35 Leu Arg Gly Leu Glu Arg Leu Leu Leu Ser Glu Gin Leu Ser Arg Leu 

2885 2890 2895 

Asp Ala Glu Ser Leu Val Lys Leu Ser Val Asp Arg Val Asn Val His 
2900 2905 2910 

4Q Ser Pro His Arg Ala Met Ala Ala Leu Gly Leu Met Leu Thr Cys Met 

2915 2920 2925 

Tyr Thr Gly Lys Glu Lys Val Ser Pro Gly Arg Thr Ser Asp Pro Asn 
2930 2935 2940 

Pro Ala Ala Pre Asp Ser Glu ser val lie Val Ala Met Glu Arg Val 
45 2945 2950 2955 2960 

Ser Val Leu Phe Asp Arg lie Arg Lys Gly Phe Pro Cys Glu Ala Arg 
2965 2970 2975 

Val Val Ala Arg lie Leu Pro Gin Phe Leu Asp Asp Phe Phe Pro Pro 
50 2980 2985 2990 

Gin Aso He Met Asn Lys Val He Gly Glu Phe Leu Ser Asn Gin Gin 
2995 " 3000 3005 
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Pro Tyr Pro Gin Phe Met Ala Thr Val Val Tyr Lys Val Phe Gin Thr 
3010 3015 3020 

Leu His Ser Thr Gly Gin Ser Ser Met Val Arg Asp Trp Val Met Leu 
3025 3030 3035 3040 
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Ser Leu Ser Asn Phe Thr Gin Arg Ala Pro Val Ala Met Ala Thr Trp 
3045 3050 3055 

Ser Leu Ser Cys Phe Phe Val Ser Ala Ser Thr Ser Pro Trp Val Ala 
3060 3065 3070 
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Ala He Leu Pre His Val He Ser Arg Met Gly Lys Leu Glu Gin Val 
3075 3080 3085 

Asp Val Asn Leu Phe Cys Leu Val Ala Thr Asp Phe Tyr Arg His Gin 
3090 3095 3100 

He Glu Glu Glu Leu Asp Arg Arg Ala Phe Gin Ser Val Leu Glu Val 
3105 3110 3115 3120 

Val Ala Ala Pro Gly Ser Pro Tyr His Arg Leu Leu Thr Cys Leu Arg 
3125 3130 3135 

Asn Val His Lys Val Thr Thr Cys 
3140 
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Claims 

1. An isolated, purified or recombinant polypeptide comprising a huntingtin protein or a mutuant, fragment 
or variant thereof having substantially the same activity as huntingtin protein. 

2. A polypeptide according to claim 1 having the amino acid sequence shown in SEQ ID NO:6. 

3. A polypeptide according to claim 1 or 2 which is essentially purified and/or has at least 5 contiguous amino 
acids. 

4. An isolated, purified or recombinant nucleic acid molecule comprising nucleic acid which is: 

(a) a sequence encoding a huntingtin protein according to any preceding claim (whether normal or ge- 
netically defective), or its complementary strand; 

(b) a sequence that is substantially homologous to, or hybridises under stringent conditions to, either 
sequence in (a); 

(c) a sequence that is substantially homologous to, or would hybridise under stringent conditions to, a 
sequence in (a) or (b) but for the degeneracy of the genetic code; 

or a fragment of any of (a), (b) or (c). 

5. A nucleic acid according to claim 1, wherein the huntingtin protein has the amino acid sequence shown 
in SEQ ID NO:6 and/or the nucleic acid is DNA encoding the amino acid sequence SEQ ID NO:5. 

6. A nucleic acid molecule according to claim 4 or 5 which is a probe for detecting the presence of huntingtin 
25 in a sample comprising being at least 5, such as at least 15, contiguous nucleotides. 

7. A (preferably recombinant) nucleic acid molecule according to any of claims 4 to 6 comprising a transcrip- 
tional region functional in a cell operably linked to a sequence complimentary to an RNA sequence en- 
coding a protein according to any of claims 1 to 3 or at least 5 contiguous amino acids thereof. 

30 

8. A vector comprising a nucleic acid molecule according to any of claims 4 to 7. 

9. A vector according to claim 8 wherein the nucleic acid molecule, such as encoding huntingtin protein, is 
operably linked to transcriptional and/or translational expression signals. 

10. A host cell transformed or transfected with a vector according to claim 4 or 5. 

11. An antibody specific for huntingtin protein, or a protein according to any of claims 1 to 3. 

12. A hybridoma which produces an antibody according to claim 11. 

13. A method of detecting the presence of, or predisposition to develop, Huntington's disease in a subject, 
the method comprising evaluating the characteristics of huntingtin nucleic acid in a sample from the sub- 
ject, for example in relation to the number of (CAG) repeats. 

45 14. A method according to claim 13 comprising: 

(a) taking a sample from the subject; 

(b) evaluating the characteristics of huntingtin nucleic acid in the sample, wherein the evaluation com- 
prises detecting the huntingtin (CAG) n region in the sample; and 

(c) comparing the characteristics found in (b) with a similar analysis from an individual not having, or 
so not suspected of having, Huntington's disease; and 

(d) the presence of, or predisposition to develop, Huntington's disease being indicated if those char- 
acteristics in the huntingtin (CAG) n region differ. 

15. A method according to claim 13 comprising: 
55 (a) taking a sample from a subject and; 

(b) evaluating the characteristics of huntingtin nucleic acid comprising the huntingtin (CAG) n region in 
the sample by Southern blot, northern blot, or polymerase chain reaction analysis. 
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16. The use of: 

(a) a nucleic acid molecule according to any of claims 4 to 6 or a vector according to claim 8 which 
encodes a functional (or non-defective) protein; 

(b) a polypeptide according to any of claims 1 to 3 which is functional (or non-defective); 

5 (c) a host cell according to claim 10 expressing a polypeptide which is functional (or non-defective); 

and/or 

(d) an antagonist to, or a compound that binds to, huntingdon protein; 

in the preparation of an agent for treating, delaying or preventing a neurodegenerative disorder. 

10 17. The use according to claim 16 which is gene therapy. 

18. The use according to claim 16 or 17 for treating, preventing or delaying Huntingdon's disease. 

19. The use according to any of claims 16 to 17 wherein the nucleic acid has from 11 to 34 (CAG) repeats 
and/or the polypeptide has from 11 to 34 Gin repeats, said repeats being consecutive. 

20. A diagnostic and/or immunoassay kit comprising at least one container and; 

(a) a nucleic acid molecule according to any of claims 4 to 6, optionally labelled; or 

(b) an antibody according to claim 11, optionally labelled. 

The use of: 

(a) a nucleic acid molecule according to any of claims 4 to 6 or a vector according to claim 8 which 
encodes a functional (or non-defective) protein; 

(b) a polypeptide according to any of claims 1 to 3 which is functional (or non-defective); 

(c) a host cell according to claim 10 expressing a polypeptide which is functional (or non-defective); 
and/or 

(d) an antagonist to, or a compound that binds to, huntingdon protein; 
in the preparation of a medicament. 

A pharmaceutical composition comprising: 

(a) a nucleic acid molecule according to any of claims 4 to 6 or a vector according to claim 8 which 
encodes a functional (or non-defective) protein; 

(b) a polypeptide according to any of claims 1 to 3 which is functional (or non-defective); 

(c) a host cell according to claim 10 expressing a polypeptide which is functional (or non-defective); 
and/or 

(d) an antagonist to, or a compound that binds to, huntingdon protein; 
in admixture with pharmaceutical^ acceptable carrier. 

A process for the preparation of a polypeptide, the process comprising culturing a host cell according to 
claim 10 under conditions whereby the polypeptide is expressed, and purifying or isolating the polypep- 
tide. 
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